Solved – Why does Covariance measure only Linear dependence

covariancelinear

1) What is meant by linear dependence?

2) How can I convince myself that covariance measures linear dependence?

3) How I can convince myself that non-linear dependence is not measured by covariance?

Best Answer

A1) Say two variables X and Y are linearly dependent, then $X = \alpha Y + c$ for some $\alpha,c \in \mathbb{R}$.

A2) The formula for covariance is:

$$COV(X,Y) = E([X-E(X)][Y-E(Y)]) = E(XY)-E(X)E(Y)$$

From A1, consider some linear relationship $X = \alpha Y + c$, but all we have is the data from individual points in each variable. How do we get the value of $\alpha$? Well, it turns out we can instead ask the question, "how do we draw a line between these points so as to minimise the sum of squared differences between each point and the line?". And when we do this analysis for two variables, we get a closed form equation that looks like this:

$$\alpha = \dfrac{E(XY) -E(Y)E(X)}{E(X^2) - E(X)^2}$$

Please note that the numerator is the covariance. I.e.

$$ \alpha = \dfrac{COV(X,Y)}{E(X^2) - E(X)^2}$$

Correlation (e.g. Pearson) is often a measure of the covariance normalised against something to give it a comparable value. So you see the entire measure precedes from the analysis of how to fit a line to some data.

A3) Covariance doesn't measure non-linear relationships for the exact same reason it measures linear ones. Namely, that you can basically think of it as the slope in a linear equation (e.g. $X=\alpha Y + c$), so when you try and fit a line to a curve, the sum of square differences between the points and the line may be large. Here is a good diagram illustrating the implications. The numbers indicate Pearson's correlation coefficient, whilst the diagrams show the corresponding scatter plots.

Related Solutions

Solved – When is distance covariance less appropriate than linear covariance

I have tried to collect a few remarks on distance covariance based on my impressions from reading the references listed below. However, I do not consider myself an expert on this topic. Comments, corrections, suggestions, etc. are welcome.

The remarks are (strongly) biased towards potential drawbacks, as requested in the original question.

As I see it, the potential drawbacks are as follows:

The methodology is new. My guess is that this is the single biggest factor regarding lack of popularity at this time. The papers outlining distance covariance start in the mid 2000s and progress up to present day. The paper cited above is the one that received the most attention (hype?) and it is less than three years old. In contrast, the theory and results on correlation and correlation-like measures have over a century of work already behind them.
The basic concepts are more challenging. Pearson's product-moment correlation, at an operational level, can be explained to college freshman without a calculus background pretty readily. A simple "algorithmic" viewpoint can be laid out and the geometric intuition is easy to describe. In contrast, in the case of distance covariance, even the notion of sums of products of pairwise Euclidean distances is quite a bit more difficult and the notion of covariance with respect to a stochastic process goes far beyond what could reasonably be explained to such an audience.
It is computationally more demanding. The basic algorithm for computing the test statistic is $O(n^2)$ in the sample size as opposed to $O(n)$ for standard correlation metrics. For small sample sizes this is not a big deal, but for larger ones it becomes more important.
The test statistic is not distribution free, even asymptotically. One might hope that for a test statistic that is consistent against all alternatives, that the distribution—at least asymptotically—might be independent of the underlying distributions of $X$ and $Y$ under the null hypothesis. This is not the case for distance covariance as the distribution under the null depends on the underlying distribution of $X$ and $Y$ even as the sample size tends to infinity. It is true that the distributions are uniformly bounded by a $\chi^2_1$ distribution, which allows for the calculation of a conservative critical value.
The distance correlation is a one-to-one transform of $|\rho|$ in the bivariate normal case. This is not really a drawback, and might even be viewed as a strength. But, if one accepts a bivariate normal approximation to the data, which can be quite common in practice, then little, if anything, is gained from using distance correlation in place of standard procedures.
Unknown power properties. Being consistent against all alternatives essentially guarantees that distance covariance must have very low power against some alternatives. In many cases, one is willing to give up generality in order to gain additional power against particular alternatives of interest. The original papers show some examples in which they claim high power relative to standard correlation metrics, but I believe that, going back to (1.) above, its behavior against alternatives is not yet well understood.

To reiterate, this answer probably comes across quite negative. But, that is not the intent. There are some very beautiful and interesting ideas related to distance covariance and the relative novelty of it also opens up research avenues for understanding it more fully.

References:

G. J. Szekely and M. L. Rizzo (2009), Brownian distance covariance, Ann. Appl. Statist., vol. 3, no. 4, 1236–1265.
G. J. Szekely, M. L. Rizzo and N. K. Bakirov (2007), Measuring and testing independence by correlation of distances, Ann. Statist., vol. 35, 2769–2794.
R. Lyons (2012), Distance covariance in metric spaces, Ann. Probab. (to appear).

Solved – Difference between correlation and covariance: is covariance only useful if the relation is linear

As @RichardHardy points out in his comment, correlation is simply scaled covariance. So, they are useful for exactly the same types of relationships, but correlations are comparable across different relationships and correlations will not be affected by choice of units, while covariances will.

set.seed(123)
htin <- rnorm(100,68,3)
wtpound <- htin*2.5 + rnorm(100,0,5)
htm <- htin*0.0254
wtkg <- wtpound/2.2

cor(htin,wtpound) #0.81
cov(htin,wtpound) #18.09

cor(htm,wtkg) #0.81
cov(htm,wtkg) #0.21

If you have a perfect U shaped relation, both cov and corr will be 0:

x <- seq(-4,4,by = 0.1)
y <- x^2
cor(x,y) #1.63*10^-16
cov(x,y) #1.89*10^-15

Best Answer

Related Solutions

Solved – When is distance covariance less appropriate than linear covariance

Solved – Difference between correlation and covariance: is covariance only useful if the relation is linear

Related Question