Consider a logistic regression on these data:
X1 X2 Y
1 0 0
1 0 1
0 1 0
0 1 0
0 1 0
0 1 1
1 1 1
R accepts three different representations of the data: one row per table entry, and two condensed representations (one with weights, one with successes and failures). In my mind, these three specifications should all be the same mathematically: the data is the same 7 observations, and they are presented to R in different formats.
data1 <- data.frame(x1=c(1,1,0,0,0,0,1), x2=c(0,0,1,1,1,1,1), y=c(0,1,0,0,0,1,1))
data2 <- data.frame(x1=c(0,1,0,1), x2=c(0,0,1,1), y=c(0,0.5,0.25,1), w=c(0,2,4,1))
data3x <- data.frame(x1=c(0,1,0,1), x2=c(0,0,1,1))
data3y <- cbind(c(0,1,1,1), c(0,1,3,0))
model1 <- glm(y~x1+x2, data=data1, family="binomial")
model2 <- glm(y~x1+x2, data=data2, family="binomial", weight=w)
model3 <- glm(data3y~data3x$x1+data3x$x2, family="binomial")
Models 2 and 3 are the same, which makes sense. But Model 1 is different from model 2 and 3 and I can't reason out why the same data should return different model statistics (coefficients, null and residual deviance) than the others. Models 2 and 3 just use a different representation of the same data.
This might be a red herring, but Model 1 has its coefficients shifted by 4 units compared to Model 2, which is exactly the difference in the number of (populated) rows/residual degrees of freedom between the two.
> model1
Call: glm(formula = y ~ x1 + x2, family = "binomial", data = data1)
Coefficients:
(Intercept) x1 x2
-19.66 19.66 18.57
Degrees of Freedom: 6 Total (i.e. Null); 4 Residual
Null Deviance: 9.561
Residual Deviance: 7.271 AIC: 13.27
> model2
Call: glm(formula = y ~ x1 + x2, family = "binomial", data = data2,
weights = w)
Coefficients:
(Intercept) x1 x2
-23.66 23.66 22.57
Degrees of Freedom: 2 Total (i.e. Null); 0 Residual
Null Deviance: 2.289
Residual Deviance: 3.167e-10 AIC: 9.112
Best Answer
The model is
$$\operatorname{E}Y = \frac{1}{1+ \exp[-(\beta_0 + \beta_1 x_1 + \beta_2 x_2)]}$$
& it's saturated, having as many parameters to estimate as the no. distinct covariate patterns. So the equations to solve are as follows:
For $x_1=1$, $x_2 =0$, $\operatorname{E}Y=\frac{1}{2}$
$\beta_0 + \beta_1 = 0$
For $x_1=0$, $x_2 =1$, $\operatorname{E}Y=\frac{1}{4}$
$\beta_0 + \beta_2 = -\log 3$
For $x_1=1$, $x_2=1$, $\operatorname{E}Y=1$
$\beta_0 + \beta_1 + \beta_2 = \infty$
There is quasi-complete separation (if $x_1+x_2>1$ then $E{Y}=1$), so maximum-likelihood estimates of the coefficients are unbounded. But any sufficiently large value $c$ can stand in for infinity, giving the solutions:
$\beta_0 = -(c + \log 3)$
$\beta_1 = c + \log3$
$\beta_2 = c$
I don't know why
glm
gives up trying to maximize the likelihood at different values for $c$ depending on the data structure, but it's of no practical consequence: predictions & differences in likelihoods will be almost the same.