Solved – Why does Covariance measure only Linear dependence

covariancelinear

1) What is meant by linear dependence?

2) How can I convince myself that covariance measures linear dependence?

3) How I can convince myself that non-linear dependence is not measured by covariance?

Best Answer

A1) Say two variables X and Y are linearly dependent, then $X = \alpha Y + c$ for some $\alpha,c \in \mathbb{R}$.

A2) The formula for covariance is:

$$COV(X,Y) = E([X-E(X)][Y-E(Y)]) = E(XY)-E(X)E(Y)$$

From A1, consider some linear relationship $X = \alpha Y + c$, but all we have is the data from individual points in each variable. How do we get the value of $\alpha$? Well, it turns out we can instead ask the question, "how do we draw a line between these points so as to minimise the sum of squared differences between each point and the line?". And when we do this analysis for two variables, we get a closed form equation that looks like this:

$$\alpha = \dfrac{E(XY) -E(Y)E(X)}{E(X^2) - E(X)^2}$$

Please note that the numerator is the covariance. I.e.

$$ \alpha = \dfrac{COV(X,Y)}{E(X^2) - E(X)^2}$$

Correlation (e.g. Pearson) is often a measure of the covariance normalised against something to give it a comparable value. So you see the entire measure precedes from the analysis of how to fit a line to some data.

A3) Covariance doesn't measure non-linear relationships for the exact same reason it measures linear ones. Namely, that you can basically think of it as the slope in a linear equation (e.g. $X=\alpha Y + c$), so when you try and fit a line to a curve, the sum of square differences between the points and the line may be large. Here is a good diagram illustrating the implications. The numbers indicate Pearson's correlation coefficient, whilst the diagrams show the corresponding scatter plots.

enter image description here

Related Question