I am trying to run regression on financial data in R. I am new to regression analysis so I am finding it to difficult to interpret certain scenarios. I have the code as follows:
#regression analysis
fit <- lm(fiveMinReturns~RegressionData, data=maindata)
summary(fit) # show results
#correlation
cor(maindata$fiveMinReturns,maindata$RegressionData,use="everything")
My output is:
Call:
lm(formula = fiveMinReturns ~ RegressionData, data = maindata)
Residuals:
Min 1Q Median 3Q Max
-0.205790 -0.001144 -0.000062 0.001117 0.156418
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.346e-05 8.785e-06 7.223 5.09e-13 ***
RegressionData 1.597e-07 1.432e-08 11.155 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.004035 on 210912 degrees of freedom
Multiple R-squared: 0.0005896, Adjusted R-squared: 0.0005849
F-statistic: 124.4 on 1 and 210912 DF, p-value: < 2.2e-16
cor(maindata$fiveMinReturns,maindata$RegressionData,use="everything")
[1] 0.02428219
p-value is very small that means two variables are tightly coupled, but correlation is small too.
My question is how do I evaluate this situation?
Can we say that this equation will give correct results almost every time?
Which scenario suggests both p-value and correlation both to be really small?
What measures should i take to improve the result?
Best Answer
It means that the relationship between your two variables is very small, and only detected because your analyses had massive power (df of 200k+).
No, again, because the relationship between the two variables is very small, and your data do not explain a lot of the underlying variability in your dependent measure. Specifically, your model has an R^2 of 0.0005849, which means that your independent variable explains .05% (not 5%) of all the variability in the dependent variable.
If there is a relationship between the two variables, regardless of how large or small (i.e., H1 is true), the p-value will become smaller as your sample grows larger. So, it is entirely possible that the relationship between your variables is tiny, and yet you've still managed to detect it because your sample is so huge.
I'm not sure what you mean. The result is fine as it is. You tested a hypothesis and the data do not really support it.