As per link of correlation and R2
, it is mathematically shown that square of pearson correlation is equal to r2. However, I am trying to replicate these results with my data in python and I do not get the same results. Please somebody explain what mistake I am making.
To calculate pearson correlation, I use the following two methods in python both of which give correlation output of 0.0818
from scipy.stats import pearsonr
x=np.corrcoef(y_df, yhat)
x[0,1]
from statistics import stdev
sd1=stdev(y_df)
sd2=stdev(yhat)
cov=np.cov(yhat,y_df)
print(cov/(sd1*sd2))
Now to calculate r2 score, again I use two different methods which give out the same value of 0.0001707
from sklearn.metrics import r2_score
print(r2_score(y_df,yhat))
ssr = ((y_df-yhat)**2).sum()
tss = ((y_df-y_df.mean())**2).sum()
print(1-ssr/tss)
Here, square of pearson coefficient is not equal to r2. I am not able to understand what mistake I have made in assumption. Any help appreciated.
Best Answer
Copying my comment: