I think we should give the word to Venables and Ripley, page 198 in MASS:
There is one fairly common circumstance in which both convergence
problems and the Hauck-Donner phenomenon can occur. This is when the
fitted probabilities are extremely close to zero or one. Consider a
medical diagnosis problem with thousands of cases and around fifty
binary explanatory variables (which may arise from coding fewer
categorical factors); one of these indicators is rarely true but
always indicates that the disease is present. Then the fitted
probabilities of cases with that indicator should be one, which can
only be achieved by taking $\hat\beta_i = \infty$. The result from
glm
will be warnings and an estimated coefficient of around +/- 10.
Besides potential numerical difficulties there is no formal problem with probabilities being estimated numerically to 0 or 1. However, the $t$-test, which is based on a quadratic approximation, for testing the hypothesis $\beta_i = 0$ can become a poor approximation of the likelihood ratio test, and the $t$-test may appear insignificant though in reality the hypothesis is definitely wrong. As I understand it, this it what the warning is about.
With many predictors a situation like the one Venables and Ripley describes may easily occur; one predictor is mostly not informative, but in certain cases it is a strong predictor for a case.
I would suggest that you use Frank Harrell's excellent rms package. It contains many useful functions to validate and calibrate your model. As far as I know, you cannot assess predictive performance solely based on the coefficients. Further, I would suggest that you use the bootstrap to validate the model. The AUC or concordance-index (c-index) is a useful measure of predictive performance. A c-index of $0.8$ is quite high but as in many predictive models, the fit of your model is likely overoptimistic (overfitting). This overoptimism can be assessed using bootstrap. But let me give an example:
#-----------------------------------------------------------------------------
# Load packages
#-----------------------------------------------------------------------------
library(rms)
#-----------------------------------------------------------------------------
# Load data
#-----------------------------------------------------------------------------
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
#-----------------------------------------------------------------------------
# Fit logistic regression model
#-----------------------------------------------------------------------------
mylogit <- lrm(admit ~ gre + gpa + rank, x=TRUE, y=TRUE, data = mydata)
mylogit
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 400 LR chi2 41.46 R2 0.138 C 0.693
0 273 d.f. 5 g 0.838 Dxy 0.386
1 127 Pr(> chi2) <0.0001 gr 2.311 gamma 0.387
max |deriv| 2e-06 gp 0.167 tau-a 0.168
Brier 0.195
Coef S.E. Wald Z Pr(>|Z|)
Intercept -3.9900 1.1400 -3.50 0.0005
gre 0.0023 0.0011 2.07 0.0385
gpa 0.8040 0.3318 2.42 0.0154
rank=2 -0.6754 0.3165 -2.13 0.0328
rank=3 -1.3402 0.3453 -3.88 0.0001
rank=4 -1.5515 0.4178 -3.71 0.0002
On the bottom you see the usual regression coefficients with corresponding $p$-values. On the top right, you see several discrimination indices. The C
denotes the c-index (AUC), and a c-index of $0.5$ denotes random splitting whereas a c-index of $1$ denotes perfect prediction. Dxy
is Somers' $D_{xy}$ rank correlation between the predicted probabilities and the observed responses. $D_{xy}$ has simple relationship with the c-index: $D_{xy}=2(c-0.5)$. A $D_{xy}$ of $0$ occurs when the model's predictions are random and when $D_{xy}=1$, the model is perfectly discriminating. In this case, the c-index is $0.693$ which is slightly better than chance but a c-index of $>0.8$ is good enough for predicting the outcomes of individuals.
As said above, the model is likely overoptimistic. We now use bootstrap to quantify the optimism:
#-----------------------------------------------------------------------------
# Validate model using bootstrap
#-----------------------------------------------------------------------------
my.valid <- validate(mylogit, method="boot", B=1000)
my.valid
index.orig training test optimism index.corrected n
Dxy 0.3857 0.4033 0.3674 0.0358 0.3498 1000
R2 0.1380 0.1554 0.1264 0.0290 0.1090 1000
Intercept 0.0000 0.0000 -0.0629 0.0629 -0.0629 1000
Slope 1.0000 1.0000 0.9034 0.0966 0.9034 1000
Emax 0.0000 0.0000 0.0334 0.0334 0.0334 1000
D 0.1011 0.1154 0.0920 0.0234 0.0778 1000
U -0.0050 -0.0050 0.0015 -0.0065 0.0015 1000
Q 0.1061 0.1204 0.0905 0.0299 0.0762 1000
B 0.1947 0.1915 0.1977 -0.0062 0.2009 1000
g 0.8378 0.9011 0.7963 0.1048 0.7331 1000
gp 0.1673 0.1757 0.1596 0.0161 0.1511 1000
Let's concentrate on the $D_{xy}$ which is at the top. The first column denotes the original index, which was $0.3857$. The column called optimism
denotes the amount of estimated overestimation by the model. The column index.corrected
is the original estimate minus the optimism. In this case, the bias-corrected $D_{xy}$ is a bit smaller than the original. The bias-corrected c-index (AUC) is $c=\frac{1+ D_{xy}}{2}=0.6749$.
We can also calculate a calibration curve using resampling:
#-----------------------------------------------------------------------------
# Calibration curve using bootstrap
#-----------------------------------------------------------------------------
my.calib <- calibrate(mylogit, method="boot", B=1000)
par(bg="white", las=1)
plot(my.calib, las=1)
n=400 Mean absolute error=0.016 Mean squared error=0.00034
0.9 Quantile of absolute error=0.025
The plot provides some evidence that our models is overfitting: the model underestimates low probabilities and overestimates high probabilities. There is also a systematic overestimation around $0.3$.
Predictive model building is a big topic and I suggest reading Frank Harrell's course notes.
Best Answer
Yes. The general rule of thumb is that you want 10 cases in the smaller group for each variable. So, with 10 IVs, you'd want at least 100 buyers and 100 non-buyers.
Usually a table is presented, although what goes into that table varies depending on the style of the journal or whatever. The American Psychological Association's style is frequently used. I would want to include the coefficient and its SE and the odds ratio for each IV. Another nice thing to do is produce the predicted proportion for various combinations of the IVs, but this can be tricky with lots of IVs. R has a plot() for the glm that gives nice default plots