I am using the glm function in R to fit robust poisson regression models.
The confidence interval this produces is not consistent with the p-value from the model: confidence intervals that do not overlap 0 have p-values greater than .05. With confint.glm
I obtain this CI: (0.078, 2.480 )(=2*pnorm(z, lower.tail=F))
Call:
glm2(formula = paste("base~", var, sep = ""), family = poisson(link = log),
data = base2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8712 -0.8712 -0.8712 0.8347 1.5279
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0369 0.5772 -3.529 0.000418 ***
augm_alcoolpas augm 1.0680 0.5908 1.808 0.070656 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 138.88 on 188 degrees of freedom
Residual deviance: 134.30 on 187 degrees of freedom
AIC: 270.3
Number of Fisher Scoring iterations: 5
And when I process anova(mod)
I obtain an another pvalue, which is significant (pvalue=0.03)
Analysis of Deviance Table
Model: poisson, link: log
Response: base
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 188 138.88
augm_alcool 1 4.5794 187 134.30 0.03236 *
I understand that pvalue of anova()
is different of summary.glm()
because the p-value in ANOVA is calculated with a chi-square and the p-value in summary.glm with Wald.
I have two questions:
- In summary results, why is the p-value not significant (at the same level as
was used to calculate the CI) ? - What could I perform differently so that inference agrees with the CI?
Best Answer
Basically, the Wald (calculated by
summary.glm
) and the Likelihood Ratio Tests (calculated byanova()
) do not agree with the inference at the arbitrary 0.05 threshold. If you interpret the p-value correctly, one test says there is a 7.1% chance of replicating the study and obtaining results which are as inconsistent or more inconsistent with the null hypothesis given it is true, the other test estimates that probability at 3.2%. This is not a very compelling difference, especially in light of having a sample consisting of only 188 observations.To get some intuition on why the tests may differ, consider the plot below which shows a hypothetical likelihood calculated over a range of possible alternate parameterizations a_0 denotes the null hypothesis, at some arbitrary value, and the apex of the quadratic curve is the maximum likelihood estimate (estimated by
glm
). The Wald test measures the (scaled) horizontal distance between the two values whereas the LRT measures the vertical distance. As the sample size approached infinity, these tests always converge to the same value and inference. But in small samples, they at times disagree. When they do, it's important to interpret findings as "borderline" significant, or with similar cautionary language (if it's necessary to use testing at all)To have inference which agrees with the CI, you must use the Wald based inference obtained from
summary.glm
. In that case, if a coefficient's 95% CI does not contain 0, the p-value will not be less than 0.05.Image source: http://web.archive.org/web/20161220161700/http://www.ats.ucla.edu:80/stat/mult_pkg/faq/general/nested_tests.htm