Solved – the relationship between orthogonal, correlation and independence

correlationindependence

I've read an article saying that when using planned contrasts to find means that are different in an one way ANOVA, constrasts should be orthogonal so that they are uncorrelated and prevent the type I error from being inflated.

I don't understand why orthogonal would mean uncorrelated under any circumstances. I can't find a visual/intuitive explanation of that, so I tried to understand these articles/answers

https://www.psych.umn.edu/faculty/waller/classes/FA2010/Readings/rodgers.pdf

What does orthogonal mean in the context of statistics?

but to me, they contradict each other. The first says that if two variables are uncorrelated and/or orthogonal then they are linearly independent, but that the fact that they are linearly independant does not imply that they are uncorrelated and/or orthogonal.

Now on the second link there are answers that state things like "orthogonal means uncorrelated" and "If X and Y are independent then they are Orthogonal. But the converse is not true".

Another interesting comment in the second link state that the correlation coefficient between two variables is equal to the cosine of the angle between the two vectors corresponding to these variables, which implies that two orthogonal vectors are completely uncorrelated (which isn't what the first article claims).

So what's the true relationship between independence, orthogonal and correlation ? Maybe I missed something but I can't find out what it is.

Best Answer

Independence is a statistical concept. Two random variables $X$ and $Y$ are statistically independent if their joint distribution is the product of the marginal distributions, i.e. $$ f(x, y) = f(x) f(y) $$ if each variable has a density $f$, or more generally $$ F(x, y) = F(x) F(y) $$ where $F$ denotes each random variable's cumulative distribution function.

Correlation is a weaker but related statistical concept. The (Pearson) correlation of two random variables is the expectancy of the product of the standardized variables, i.e. $$ \newcommand{\E}{\mathbf E} \rho = \E \left [ \frac{X - \E[X]}{\sqrt{\E[(X - \E[X])^2]}} \frac{Y - \E[Y]}{\sqrt{\E[(Y - \E[Y])^2]}} \right ]. $$ The variables are uncorrelated if $\rho = 0$. It can be shown that two random variables that are independent are necessarily uncorrelated, but not vice versa.

Orthogonality is a concept that originated in geometry, and was generalized in linear algebra and related fields of mathematics. In linear algebra, orthogonality of two vectors $u$ and $v$ is defined in inner product spaces, i.e. vector spaces with an inner product $\langle u, v \rangle$, as the condition that $$ \langle u, v \rangle = 0. $$ The inner product can be defined in different ways (resulting in different inner product spaces). If the vectors are given in the form of sequences of numbers, $u = (u_1, u_2, \ldots u_n)$, then a typical choice is the dot product, $\langle u, v \rangle = \sum_{i = 1}^n u_i v_i$.


Orthogonality is therefore not a statistical concept per se, and the confusion you observe is likely due to different translations of the linear algebra concept to statistics:

a) Formally, a space of random variables can be considered as a vector space. It is then possible to define an inner product in that space, in different ways. One common choice is to define it as the covariance: $$ \langle X, Y \rangle = \mathrm{cov} (X, Y) = \E [ (X - \E[X]) (Y - \E[Y]) ]. $$ Since the correlation of two random variables is zero exactly if the covariance is zero, according to this definition uncorrelatedness is the same as orthogonality. (Another possibility is to define the inner product of random variables simply as the expectancy of the product.)

b) Not all the variables we consider in statistics are random variables. Especially in linear regression, we have independent variables which are not considered random but predefined. Independent variables are usually given as sequences of numbers, for which orthogonality is naturally defined by the dot product (see above). We can then investigate the statistical consequences of regression models where the independent variables are or are not orthogonal. In this context, orthogonality does not have a specifically statistical definition, and even more: it does not apply to random variables.

Addition responding to Silverfish's comment: Orthogonality is not only relevant with respect to the original regressors but also with respect to contrasts, because (sets of) simple contrasts (specified by contrast vectors) can be seen as transformations of the design matrix, i.e. the set of independent variables, into a new set of independent variables. Orthogonality for contrasts is defined via the dot product. If the original regressors are mutually orthogonal and one applies orthogonal contrasts, the new regressors are mutually orthogonal, too. This ensures that the set of contrasts can be seen as describing a decomposition of variance, e.g. into main effects and interactions, the idea underlying ANOVA.

Since according to variant a), uncorrelatedness and orthogonality are just different names for the same thing, in my opinion it is best to avoid using the term in that sense. If we want to talk about uncorrelatedness of random variables, let's just say so and not complicate matters by using another word with a different background and different implications. This also frees up the term orthogonality to be used according to variant b), which is highly useful especially in discussing multiple regression. And the other way around, we should avoid applying the term correlation to independent variables, since they are not random variables.


Rodgers et al.'s presentation is largely in line with this view, especially as they understand orthogonality to be distinct from uncorrelatedness. However, they do apply the term correlation to non-random variables (sequences of numbers). This only makes sense statistically with respect to the sample correlation coefficient $r$. I would still recommend to avoid this use of the term, unless the number sequence is considered as a sequence of realizations of a random variable.

I've scattered links to the answers to the two related questions throughout the above text, which should help you put them into the context of this answer.

Related Question