I used summary.formula
from Hmisc
with continuous Age
and binary outcome O
with test=TRUE
. This returned a p-value for Age
predicting O
(if I understand this correctly).
I then ran a glm
using Age
and O
(univariate logistic regression), which returned a different p-value. I thought that the p-value should be the same?
Edit:
library(Hmisc)
summary.formula(Outcome ~ cut2(Age, seq(15,75,10)), method="reverse", test=TRUE)
p-value = 0.8
x = glm(Outcome ~ Age, family=binomial(link="logit"))
p-value = 0.4
(I'm getting quite confused: the example above which gives the 0.8 p-value is binning the ages, but I was told that the p-value it returns is for all the ages. However, if I run the summary without the binning, the p-value is 0.5.)
Best Answer
First, you need to know what a p-value is. A p-value is the probability that you would observe results as extreme, or more extreme, than the ones you have, if the null hypothesis was in fact true.
The reason you aren't getting the same p-value in your two tests, is that you aren't examining the same null hypotheses under the same assumptions. Pick up a stats textbook.
If a relationship between two variables is approximately linear, binning will reduce statistical power, which probably explains why you get a lower p-value without binning. Think of it this way, binning treats different values within a bin as identical. The information about the differences within a bin may be valuable, but this information is ignored by statistical tests.