Why square of pearson correlation does not match with r2 score

correlationpythonregressionregression analysis

As per link of correlation and R2

, it is mathematically shown that square of pearson correlation is equal to r2. However, I am trying to replicate these results with my data in python and I do not get the same results. Please somebody explain what mistake I am making.

To calculate pearson correlation, I use the following two methods in python both of which give correlation output of 0.0818

from scipy.stats import pearsonr
x=np.corrcoef(y_df, yhat)
x[0,1]


from statistics import stdev
sd1=stdev(y_df)
sd2=stdev(yhat)
cov=np.cov(yhat,y_df)
print(cov/(sd1*sd2))

Now to calculate r2 score, again I use two different methods which give out the same value of 0.0001707

from sklearn.metrics import r2_score
print(r2_score(y_df,yhat))

ssr = ((y_df-yhat)**2).sum()
tss = ((y_df-y_df.mean())**2).sum()
print(1-ssr/tss)

Here, square of pearson coefficient is not equal to r2. I am not able to understand what mistake I have made in assumption. Any help appreciated.

Best Answer

Copying my comment:

Is yhat the result of a linear regression? Pearson correlation is defined in any context with samples of two variables, but $R^2=1−\frac{SSR}{TSS}$ is only defined in the context of linear regression; in particular, the equality only holds when the sample covariance between yhat and y_df - yhat is zero.

Related Question