MATLAB: Getting a NaN in correlation coefficient

corrcorr2correlation coefficientmissingnannan's

Hi, i have a simple problem which unfortunately i am unable to understand.

I have matrices and i am trying to calculate correlation coefficient between two variables. A simple example from my code is attched. Why am i getting a NaN here. What does this implies

x=[-7.501899598769999514e-04;-6.501899598769999514e-04;-5.501899598769999514e-04];
y=[-0.414;-0.414;-0.414];
c11=corr2(x,y)

Best Answer

When NaNs appear in the output but are not present in the inputs

Notice that all of the values in y are identical y=[-0.414; -0.414; -0.414];

If you look at the equations for corr2() or Pearson's corr() you'll notice that both have a term in the denominator that subtracts the mean of y from each y-value. When each value of y is identical, the result is a vector of 0s. When you divide by zero, you get NaN.

Another way of putting it, the standard deviation of x or y cannot be 0. When you have a vector of identical values, the std is 0.

The NaN, in this case, is interpretted as no correlation between the two variables. The correlation describes how much one variable changes as the other variable changes. That requires both variables to change.

NaN values in the inputs spreading to the outputs

For r=corr2(x,y):

When there is 1 or more NaN values in the inputs, to corr2(x,y), the output will be NaN. Fill in the missing data before computing the 2D correlation coefficient.

For r=corr(x):

A single NaN value in position (i,j) of input matrix x will result in a full row of NaN values at row i and a full column of NaN values in column j of the output matrix r (see explanation).

x = [
     6     5     1
     3   NaN     9
     5     3     7
     9     5     5 ];
 
 r = corr(x)
            1          NaN     -0.52699
          NaN          NaN          NaN
     -0.52699          NaN            1

For r=corr(x,y):

A single NaN value in position (i,j) of either x or y inputs will results in a column of NaN values in column j of the output matrix r.

x = [
     9     5     1
     1     4     4
     2     6     4
     2     5     9 ];
y = [
     6     5     1
     3   NaN     9   
     5     3     7
     9     5     5 ];
 
 r = corr(x,y)
       0.1623          NaN     -0.92394
       0.3266          NaN     -0.23905
      0.62312          NaN      0.32367

Ignoring NaNs in corr() inputs

The rows option in corr() can be set to complete or pairwise which will ignore NaN values using different methods.

'rows','complete' removes the entire row if the row contains a NaN. In other words, it will remove row 2 from both x and y input matrices. Using the same inputs above,

r = corr(x,y,'rows','complete')
     -0.27735          0.5     -0.94491
     -0.69338           -1      0.75593
      0.81224      0.14286      0.53995
      
      
r2 = corr(x,y) % for comparison

       0.1623          NaN     -0.92394
       0.3266          NaN     -0.23905
      0.62312          NaN      0.32367

Notice that this changes all of the correlation values since the entire row #2 was removed from both inputs x and y. To confirm that, we can remove those rows and recompute the correlation matrix.

% Remove row 2 which contains a NaN in y

r3 = corr(x([1,3,4],:) ,y([1,3,4],:));  
     -0.27735          0.5     -0.94491
     -0.69338           -1      0.75593
      0.81224      0.14286      0.53995

Voila! Outputs r and r3 match.

'rows','pairwise' only removes rows only if a NaN appears in the pairing of two columns. For the same x, y inputs as above, the correlation with columns in x paired with the 2nd column in y will omit the NaN and will be based on the remaining 3 values. All other column-paired correlations will use all 4 rows of values.

r = corr(x,y,'rows','pairwise')
       0.1623          0.5     -0.92394
       0.3266           -1     -0.23905
      0.62312      0.14286      0.32367
      
      
r2 = corr(x,y) % for comparison
       0.1623          NaN     -0.92394
       0.3266          NaN     -0.23905
      0.62312          NaN      0.32367

Notice that values in columns 1 and 3 haven't changed since they do not involve column #2 in y. To confirm the correlation values in column 2 of r,

% Remove row 2 which contains a NaN in y
r3 = corr(x([1,3,4],:) ,y([1,3,4],:)); 
% Replace NaN column in r2 with new r values
r2(:,2) = r3(:,2)
       0.1623          0.5     -0.92394
       0.3266           -1     -0.23905
      0.62312      0.14286      0.32367

Voila! Updated output r2 matches r.

Related Solutions

MATLAB: Does corr function result depend on the number of columns

Understanding the output of rho=corr(x)

r = corr(x,x);

is the same as

r = corr(x);

If a single value is changed in x in column n, it should affect all of the correlation matrix results in row n and column n.

For example,

% x is an nx3 matrix
% r = corr(x)
% r shows the correlation between 
% the following column pairs:
r =        
      1 & 1     1 & 1     1 & 3
      2 & 1     2 & 2     2 & 3
      3 & 1     3 & 2     3 & 3

If a value changes in column 3, you can see above that it would affect all values in column 3 and row 3 of the correlation matrix.

Here's a demo

x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5];
x1 = x0;
x1(10) = 9;

NaN infestation

As explained in this answer, a single NaN value in the input matrix of r=corr(x) at x(i,j) will result in all NaN values in row i and column j of the output matrix.

A single NaN value in one of the two matrices x or y of r=corr(x,y) at coordinate (i,j) will result in a column of NaN values in column j of the output matrix but row i will otherwise be OK.

Ignoring missing values (e.g. NaN).

As explained in this answer, to compute column-wise correlation while ignoring missing values, set the 'Rows' property to either 'complete' or 'pairwise'.

MATLAB: Correlation between two row matrices

Like that, each value of "a" is correlated to each value of "b", but applying the formula of the correlation, the correlation of two single numbers is NaN. To compute the correlation correctly, traspose the input vectors

result  = corr(a', b');

Best Answer

Related Solutions

MATLAB: Does corr function result depend on the number of columns

MATLAB: Correlation between two row matrices

Related Question