Solved – Correlation with a constant

correlationr

I am trying to get the correlation between 2 variables, given a set of data.
Once in a while, in the given data set,When one of the variables has a constant value, since the standard deviation of that variable is zero, I get an NA value for the correlation. (In R).
I would like to assign a value for the correlation in these scenarios explicitly or try to get some value through alternate means, so that I am able to compare this point with other times I compute the correlation.
How do I go about it?
(1) Should I add some noise to that variable and compute the correlation again. Would that be a meaningful thing to do?

Best Answer

Recall that correlation is defined as

$$ \rho_{X,Y}= \frac{\sigma(X,Y)}{\sigma_X \sigma_Y} $$

This means that if one of your "variables" is constant, then it is not a variable, it has variance equal to zero and so, it's correlation with anything is undefined (since you are dividing by zero).

Standard deviation of variable $X$ plus constant $c$ is the same as standard deviation of $X$

$$ \sigma(X + c) = \sigma(X) $$

the same for covariance

$$ \sigma(X + c, Y) = \sigma(X, Y) $$

so adding noise to your constant "variable" would result with measuring correlation of your noise with some other variable (your "variable" is $c$ and noise is $X$).

On another hand, covariance of random variable with constant is zero

$$ \sigma(Y, c) = 0 $$

and constant random variable is independent of any other random variable. So if you really need to re-define correlation for such case then the best choice would be $0$. Notice however that, as noted by Nick Cox in the comment below, this does not solve any of your problems.

The basic problem with constant random variable is that it is independent of everything else and it will not change anything about your analysis. Because of this, many software packages would return errors when using constant variables, or drop them automatically from your analysis. This is what R does and such behavior is consistent with the definition of correlation.

Related Question