Solved – a “high” standard error (in logistic regression)

logisticregressionregression coefficientsstandard error

I can't find in any statistics book what would start to be considered a large standard error of a regression coefficient.

In my research, I have a group of a categorical variable with a small number of cases that in a logistic regression reports what I think is a rather large Standard Error (0.647), but since the B is big (-1.394) the coefficient is significant (p=0.031).

Is a standard error of 0.6 or 0.7 indeed an indicator that something is wrong? (in my case, it could be that there is incomplete information from the predictors, i.e. there isn't data for all combinations of my predicting variables)

Best Answer

Acceptable levels of variability depend on the actual value of the odds ratio as well as the desired application. Basically, the only case we might care about in all circumstances is when there are 0 cell counts for categorical predictors.

The odds ratio can assume values up to and including 0 and $\infty$ for a sample of any size. Inference is based on the Wald statistic: inspecting the log odds ratio divided by its standard error, it is compared to a normal distribution. In your case, the two-tailed test based on the Wald statistic of -1.394 / 0.647 = -2.15 was statistically significant at the 0.05 level so we conclude these data are highly inconsistent with a null hypothesis of no association. The upper bound of the 95% CI is exp(-1.394 + 1.96*0.647) = 0.88 which is a small odds ratio in some circumstances and large in others.

Inspecting the $2 \times 2$ contingency table of a binary predictor and binary response, the cell frequencies $ad/bc$ give the odds ratio estimate and the variance of the log odds ratio is $1/a + 1/b + 1/c + 1/d$. If any of the cell entries is 0, the variance is infinite which is undesirable.

The Mantel Haenszel estimator will give infinite variance if any of the stratum specific odds ratios have 0 cell-counts. Logistic regression has no such obvious explosion from overstratification. However, small sample bias is an issue which causes the odds ratio to tend toward twice it's value and the variance to tend toward infinity.

Some corrections have been proposed to amend this issue: one correction is adding 1 to all cells which gives a biased estimator of the odds ratio that has better variance. Median unbiased estimation is possible for bivariate analyses since it is invariant to monotonic transformations.

Related Question