Solved – Interpreting high p value and low correlation value

correlationp-valuerregression

I am trying to run regression on financial data in R. I am new to regression analysis so I am finding it to difficult to interpret certain scenarios. I have the code as follows:

#regression analysis
fit <- lm(fiveMinReturns~RegressionData, data=maindata)
summary(fit) # show results
#correlation
cor(maindata$fiveMinReturns,maindata$RegressionData,use="everything")

My output is:

Call:
lm(formula = fiveMinReturns ~ RegressionData, data = maindata)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.205790 -0.001144 -0.000062  0.001117  0.156418 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    6.346e-05  8.785e-06   7.223 5.09e-13 ***
RegressionData 1.597e-07  1.432e-08  11.155  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.004035 on 210912 degrees of freedom
Multiple R-squared:  0.0005896, Adjusted R-squared:  0.0005849 
F-statistic: 124.4 on 1 and 210912 DF,  p-value: < 2.2e-16

cor(maindata$fiveMinReturns,maindata$RegressionData,use="everything")
[1] 0.02428219

p-value is very small that means two variables are tightly coupled, but correlation is small too.
My question is how do I evaluate this situation?
Can we say that this equation will give correct results almost every time?
Which scenario suggests both p-value and correlation both to be really small?
What measures should i take to improve the result?

Best Answer

My question is how do I evaluate this situation?

It means that the relationship between your two variables is very small, and only detected because your analyses had massive power (df of 200k+).

Can we say that this equation will give correct results almost every time?

No, again, because the relationship between the two variables is very small, and your data do not explain a lot of the underlying variability in your dependent measure. Specifically, your model has an R^2 of 0.0005849, which means that your independent variable explains .05% (not 5%) of all the variability in the dependent variable.

Which scenario suggests both p-value and correlation both to be really small?

If there is a relationship between the two variables, regardless of how large or small (i.e., H1 is true), the p-value will become smaller as your sample grows larger. So, it is entirely possible that the relationship between your variables is tiny, and yet you've still managed to detect it because your sample is so huge.

What measures should i take to improve the result?

I'm not sure what you mean. The result is fine as it is. You tested a hypothesis and the data do not really support it.