Solved – Correlation coefficient for sets with non-linear correlation

correlationnonlinearnonparametric

What method can I use to test if there is a correlation between two sets of data? The correlation coefficient works if there is a linear association, but if I have two sets that are clearly (visually by graph) correlated in a non-linear way, how can I test that? Is there a coefficient or a special method?

Best Answer

In case of non-linear correlation Spearman's Rank-correlation is one method

and one more method is called Kendall's Tau

R code for Spearman's rank correlation:

cor(X, Y ,method= "spearman")

R code for Kendall's rank correlation:

cor(X, Y ,method= "kendall")

Related Solutions

Correlation – Intuitive Explanation of the MIC Algorithm for Detecting Non-Linear Correlations

Is it not telling that this was published in a non-statistical journal whose statistical peer review we are unsure of? This problem was solved by Hoeffding in 1948 (Annals of Mathematical Statistics 19:546) who developed a straightforward algorithm requiring no binning nor multiple steps. Hoeffding's work was not even referenced in the Science article. This has been in the R hoeffd function in the Hmisc package for many years. Here's an example (type example(hoeffd) in R):

# Hoeffding's test can detect even one-to-many dependency
set.seed(1)
x <- seq(-10,10,length=200)
y <- x*sign(runif(200,-1,1))
plot(x,y)  # an X
hoeffd(x,y)  # also accepts a numeric matrix

D
     x    y
x 1.00 0.06
y 0.06 1.00

n= 200 

P
  x  y 
x     0   # P-value is very small
y  0

hoeffd uses a fairly efficient Fortran implementation of Hoeffding's method. The basic idea of his test is to consider the difference between joint ranks of X and Y and the product of the marginal rank of X and the marginal rank of Y, suitably scaled.

Update

I have since been corresponding with the authors (who are very nice by the way, and are open to other ideas and are continuing to research their methods). They originally had the Hoeffding reference in their manuscript but cut it (with regrets, now) for lack of space. While Hoeffding's $D$ test seems to perform well for detecting dependence in their examples, it does not provide an index that meets their criteria of ordering degrees of dependence the way the human eye is able to.

In an upcoming release of the R Hmisc package I've added two additional outputs related to $D$, namely the mean and max $|F(x,y) - G(x)H(y)|$ which are useful measures of dependence. However these measures, like $D$, do not have the property that the creators of MIC were seeking.

Solved – Generate sets of values with high correlation coefficient

You can for example generate data from a bivariate normal distribution. The off-diagonal entry of the variance-covariance matrix is the covariance. In R, this can readily be done with rmvnorm.

Example Generate $1000$ realisations from $X=(X_{1}, X_{2})' \sim N(\mu, \Sigma)$ with $$\mu = (-1, 5)', \quad \Sigma_{11} = V(X_{1}) = 0.7, \quad \Sigma_{22}= V(X_{2}) = 0.1$$ and $\Sigma_{12} = \Sigma_{21} = \textrm{Cov}(X_1, X_2)$ such that $\textrm{Cor}(X_{1}, X_{2})=0.85$.

> #------load the package------
> library(mvtnorm)
> #----------------------------
> 
> #------compute the covariance such that cor(X1, X2) = 0.85------
> covariance <- 0.85 * sqrt(0.7) * sqrt(0.1)
> #---------------------------------------------------------------
> 
> #------variance-covariance matrix------
> sigma <- matrix(c(0.7, covariance, covariance, 0.1), nrow=2, byrow=TRUE)
> sigma
          [,1]      [,2]
[1,] 0.7000000 0.2248889
[2,] 0.2248889 0.1000000
> #--------------------------------------
> 
> #------data generation------
> test <- rmvnorm(n=1000, mean=c(-1, 5), sigma=sigma)
> #---------------------------
> 
> #------compute the empirical correlation on this particular data------
> cor(test[, 1], test[, 2])
[1] 0.8478849
> #---------------------------------------------------------------------

$$$$

NB: You can also generate data according to a linear regression model: $X_2 = a + bX_1 + \epsilon$.

Best Answer

Related Solutions

Correlation – Intuitive Explanation of the MIC Algorithm for Detecting Non-Linear Correlations

Update

Solved – Generate sets of values with high correlation coefficient

Related Question