MATLAB: Getting different p-values with corrcoef and regress functions

corcoefregress

I'm having trouble understanding the differences between the p-values returned from corrcoef and regress. I have a large data set with variables in the columns and instances in the rows. I determine which variables have significant relationships using corrcoef like this:

[R,P,RLO,RUP] = corrcoef(binaryDataXlx,'rows','pairwise');
[sigx,sigy] = find(P < 0.05);
sig = [sigx sigy];

Then, for those that are significant, I remove the NaN entries and use regress to find the coefficients of linear regression:

[B,BINT,Rregress,RINT,STATS] = regress(tempPruned(:,2),Xregress,0.05);

My main problem is that the p-values from corrcoef do not agree with the p-values from STATS(3). Is this because corrcoef compares all relationships, not just linear ones? And if so, why are the p-values from regress sometimes smaller than those from corrcoef?

Thanks you so much!!

Matthew

Best Answer

This is difficult to understand without the seeing the code and data.

Here is a simple example with one predictor, in which the P-value does agree:

rng default
x = (0:0.05:1)';
y = x + 8*rand(size(x,1),1);
figure
plot(x,y,'.')
[b,bint,r,rint,stats] = regress(y,[ones(size(x)) x])
stats(3)
[r,p] = corrcoef(y,x)

Maybe you could craft a simple example showing your problem? Specifically, if you have more than one predictor, I'm not sure how you are comparing the single P-value of the F-statistic of the regression with the many P-values of corrcoef.

Related Solutions

MATLAB: Linear regression model with fitlm

You may have to do a separate anova call to get it:

Anova = anova(correlation);
AnovaP = Anova.pValue(2);

That works for your model.

(I usually am interested in the coefficient statistics, that are generally easier to recover.)

MATLAB: How to extract p-value from regress and corrcoef

From the documentation for regress:

[b,bint,r,rint,stats] = regress(y,X) returns a 1-by-4 vector stats that contains, in order, the R² statistic, the F statistic and its p value, and an estimate of the error variance.

From the documentation for corrcoef:

[R,P]=corrcoef(...) also returns P, a matrix of p-values for testing the hypothesis of no correlation. Each p-value is the probability of getting a correlation as large as the observed value by random chance, when the true correlation is zero. If P(i,j) is small, say less than 0.05, then the correlation R(i,j) is significant.

I can’t describe it better than the documentation does.

Best Answer

Related Solutions

MATLAB: Linear regression model with fitlm

MATLAB: How to extract p-value from regress and corrcoef

Related Question