I have a data between rating of the restaurant and the approximate price of food in the restaurant. After doing Hypothesis testing. The pearson correlation coefficient was found to be 0.33 whereas p value was calculated much lesser than .05. Which means we have to accept our Alternate hypothesis as the finding is statistically significant. Our Alternate Hypothesis was that there is some relation between the two variables. Which is contrary to what we got as the coefficient i.e 0.33 which says there is no relation between the two. Why this trade off. Can anyone tell me whether I am going wrong or am I missing something regarding the concepts.
Here is what I have tried.
You can download the data here
from scipy import stats
import pandas as pd
pear = pd.read_csv("stop.csv")
pearson_coef, p_value = stats.pearsonr(pear['approx_cost(for two people)'], pear['rate'])
print("Pearson Correlation Coefficient: ", pearson_coef, "and a P-value of:", p_value) # Results
Output :
Pearson Correlation Coefficient: 0.32609607011051456 and a P-value of: 3.595009214519809e-228
According to my understanding both p value and coefficient should be related.
Any help would be highly appreciated.
Best Answer
I think you've misunderstood what correlation coefficients mean. They range from $-1$ to $1$, with $0$ indicating no correlation (which is not the same as independent). A value of $0.326$ may be closer to $0$ than to $1$ (arithmetically, not geometrically!), but it's not "too small to count". The $p$-value shows your result is significant.