Solved – Details of binary logistic regression, estimating $P(Y=1|X)$

logisticrregression

I am trying to understand logistic regression, but most sources I have found tend to leave the actual computational step as sort of a "black box", like using r glm(y~x,…) which obscures the underlying compuation. As an exercise, I want to write my own routine for doing (binary) logistic regression which requires me knowing the details.

For binary outcomes $Y$, and some predictor data $X$, we attempt to model the conditional probability

$$ p(X) = P(Y=1|X) = \frac{1}{1+ e^-{\beta X}} $$

and the corresponding linear model

$$\mathcal{l} = \text{log}\bigg( \frac{p}{1-p} \bigg) = \beta X$$

How do we now proceed in practice to compute $\beta$? In my understanding we have something like a linear system, so that for $n$ measurements & $k$ predictive variables

$$l_1 = \beta_0 + \beta_1x_{11} + … +\beta_k x_{1k}$$
$$l_2 = \beta_0 + \beta_1x_{21} + … +\beta_k x_{21}$$
$$$$
$$l_n = \beta_0 + \beta_1x_{n1} + … +\beta_k x_{nk}$$

which could then be solved using standard methods (such as SGD for an appropriate cost function). But how do we compute/estimate the values in $l_i$? The observed value of $P(Y=1|X)$ would simply be the scalar $p* = \frac{\sum_{i=1}^n y_i}{n}$ for each row, meaning that the system to be solved would be

$$ \text{log}(\frac{p*}{1-p*})\times(1,1,….,1)^T = \beta X$$

Is this correct? I have been testing with the following r code:

set.seed(1234)
x <- rnorm(1000)
y <- rbinom(1000, 1, exp(-2.3 + 0.1*x)/(1+exp(-2.3 + 0.1*x)))

fit = glm(y ~ x, family=binomial(link='logit'))
fit$coefficients

p = sum(y)/length(y)
z = rep(p, 1000)
z_star = log(z/(1-z))
fit2 = lm(z_star ~ x)
fit2$coefficients

which gives the coefficients estimates for $\beta$

> fit$coefficients
(Intercept)           x 
 -2.2261215   0.1651474 

> fit2$coefficients
  (Intercept)             x 
-2.219647e+00 -5.338566e-16 

which differ noticably in the estimates of the $x$ coefficent. I thought this may be becuse of the different optimization methods used in the glm() vs lm() methods, but is my understand of the modelling procedure correct?

Best Answer

The fit from the linear model is very different because it's a linear probability model, i.e. you're directly estimating

$$ P(Y=1|X=x) = \beta_0 + \beta_1 x $$

whereas in logistic regression you're modeling the log odds as a linear model

$$ \log \left( \frac{P(Y=1|X=x)}{P(Y=0|X=x)} \right) = \beta_0 + \beta_1 x $$

which is very different, i.e., the coefficient $\beta_1$ has a totally different meaning.

Also, to clarify: you estimate the coefficients in a logistic regression model by maximum likelihood--i.e. by maximizing the binomial (actually Bernoulli) likelihood, not by solving a system of equations.

Related Question