Solved – Chi-squared versus logistic regression

chi-squared-testlogisticp-value

I used summary.formula from Hmisc with continuous Age and binary outcome O with test=TRUE. This returned a p-value for Age predicting O (if I understand this correctly).

I then ran a glm using Age and O (univariate logistic regression), which returned a different p-value. I thought that the p-value should be the same?

Edit:

library(Hmisc)

summary.formula(Outcome ~ cut2(Age, seq(15,75,10)), method="reverse", test=TRUE)

p-value = 0.8

x = glm(Outcome ~ Age, family=binomial(link="logit"))

p-value = 0.4

(I'm getting quite confused: the example above which gives the 0.8 p-value is binning the ages, but I was told that the p-value it returns is for all the ages. However, if I run the summary without the binning, the p-value is 0.5.)

Best Answer

First, you need to know what a p-value is. A p-value is the probability that you would observe results as extreme, or more extreme, than the ones you have, if the null hypothesis was in fact true.

The reason you aren't getting the same p-value in your two tests, is that you aren't examining the same null hypotheses under the same assumptions. Pick up a stats textbook.

If a relationship between two variables is approximately linear, binning will reduce statistical power, which probably explains why you get a lower p-value without binning. Think of it this way, binning treats different values within a bin as identical. The information about the differences within a bin may be valuable, but this information is ignored by statistical tests.

Related Question