In logistic regression, we often use maximum likelihood to estimate the parameter vector $\boldsymbol{\beta}$ that parametrizes the logistic equation.
My confusion stems from the following:
- We know that the logistic regression is finding the conditional probability of $Y$ given $X$, i.e. $P(Y = 1 \mid X)$ for the binary case.
- We also know that the conditional probability $Y \mid X \sim \text{Ber}(p)$ follows a Bernoulli distribution for the binary case.
- Now the confusion I face is, after maximum likelihood estimation, we derive a set of “optimal” parameters $\boldsymbol{\beta}$, is the parameter found the same as $p$, where $p$ is the parameter of the Bernoulli distribution? My mind is fixated that since the likelihood function of $Y$ given $X$ is Bernoulli, then we should be finding the $p$ that maximise the data.
——
An attempt to answer this: finding the $\boldsymbol{\beta}$ is equivalent to finding the $p$ for the conditional distribution of $Y$ given a certain $X$ value. So they are the same.
EDIT: To clarify my question, by the definition of maximum likelihood, we are finding the parameter that maximise the conditional distribution $Y \mid X$, which in turn follows a Bernoulli. So my state of mind is that the parameter should be $p$, but of course we ended up finding $\boldsymbol{\beta}$. I understand the logistic function which is linear in the log odds with coefficients $\boldsymbol{\beta}$, what I failed to reconcile is whether we are following the definition that the maximum likelihood is returning us the parameter $p$ or $\boldsymbol{\beta}$, or it does not matter in this context since $\boldsymbol{\beta}$ and $p$ are linked.
Best Answer
The logistic regression model is a kind of generalized linear model, so it consists of the linear predictor
$$ \eta = \boldsymbol{\beta}X $$
we pass it through the inverse of the link function $g$ (the logistic function), to obtain $p$, i.e. the conditional mean of the Bernoulli distribution
$$ E[Y|X] = p = g^{-1}(\eta) $$
since $Y$ is binary, we have
$$ Y|X \sim \mathsf{Bernoulli}(p) $$
so $\boldsymbol{\beta} \ne p$, but $g^{-1}(\boldsymbol{\beta}X) = p$. Logistic regression predicts the mean of the Bernoulli distribution.
Regarding your comment, in maximum likelihood, we are estimating the parameters $\boldsymbol{\beta}$ of our model by maximizing
$$ \hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\operatorname{arg\,max}} \; \mathsf{Bernoulli}\big(y \,|\, g^{-1}(\boldsymbol{\beta}X) \big) $$
(forgive me for the slight abuse of notation). Here $p$ is a function of $X$ and $\boldsymbol{\beta}$, rather than standalone parameter. Noting in the definition of maximum likelihood prohibits us from doing this.