Solved – Stepwise binary logit regression – help for bootstrapping in Stata

binary databootstraplogitregressionstepwise regression

I am running a stepwise binary logit regression in Stata using 14 independent variables. Two of the independent variables are dummies (assuming a value of 0 or 1). I've tested the independent variables for multicollinearity and adapted them by standardizing or using the natural logarithm of their values in order to mitigate this issue (VIF<2.5). The normal model runs smoothly; however, when I want to bootstrap the sample (# of observations: 73) with 1000 replications I receive p-values of 1.0000. Furthermore, the results conclude with the note: "one or more parameters could not be estimated in 314 bootstrap replicates; standard-error estimates include only complete replications."

Two questions:
1. Is the VIF threshold that I used correct (VIF<2.5)? Which other ways are there to get rid of multicollinearity, without dropping one of the variables?
2. Since I don't assume that multicollinearity is an issue anymore, what else could I have done wrong concerning my bootstraping methodology?

Many thanks in advance for your answer(s)!

Best!
Tim

Best Answer

Consider not doing stepwise resgression, which is a good way to almost insure biased results:

Malek, M. H. and Coburn, D. E. B. J. W. (2007). On the inappropriateness of stepwise regression analysis for model building and testing. European Journal of Applied Physiology, 101(2):263–264.

Steyerberg, E. W., Eijkemans, M. J., and Habbema, J. D. F. (1999). Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. Journal of clinical epidemiology, 52(10):935–942.

Whittingham, M., Stephens, P., Bradbury, R., and Freckleton, R. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75(5):1182–1189.