I have a binary logistic regression model with a DV (disease: yes/no) and 5 predictors (demographics [age, gender, tobacco smoking (yes/no)], a medical index (ordinal) and one random treatment [yes/no]). I have also modeled all the two-sided interaction terms. The main variables are centered and there is no sign of multicollinearity (all VIFs < 2.5).
I have some questions:
-
Is bootstrapping advantageous over my single model? if so,
-
which bootstrapped model should I choose? I just wanted to see if bootstrapping algorithms follow random methods for creating new samples, or if they have rigid algorithms. Therefore, I resampled for 1000 times in each attempt (so I have several bootstrapped models, each with 1000 trials). However, each time the coefficients of the bootstrapped model differ (although the number of trials are constantly 1000). So I wonder which one should I choose for my report? Some changes are tiny and don't affect my coefficients' significance, but some make some of my coefficients non-significant (only those with P values close to 0.05 in the original model that change to 0.06 for example).
-
Should I choose a higher number like 10,000? How can I determine this limit?
-
Again should I bootstrap in the first place? If its results vary each time, can I rely on its results?
-
Do you have any other ideas in mind that can help me with my case?
Many many thanks.
Best Answer
Bootstrapping is a resampling method to estimate the sampling distribution of your regression coefficients and therefore calculate the standard errors/confidence intervals of your regression coefficients. This post has a nice explanation. For a discussion of how many replications you need, see this post.
boot
inR
, for example, puts out the "bias" which is the difference between the regression coefficients of your single model and the mean of the bootstrap samples.Here is an example in
R
:The bootstrap-ouput displays the original regression coefficients ("original") and their bias, which is the difference between the original coefficients and the bootstrapped ones. It also gives the standard errors. Note that they are bit larger than the original standard errors.
From the confidence intervals, the bias-corrected ("bca") are usually preferred. It gives the confidence intervals on the original scale. For confidence intervals for the odds ratios, just exponentiate the confidence limits.