Regression – Estimating the Bernoulli Parameter in Logistic Regression: Is It Accurate?

distributionslogisticmachine learningprobabilityregression

In logistic regression, we often use maximum likelihood to estimate the parameter vector $\boldsymbol{\beta}$ that parametrizes the logistic equation.
My confusion stems from the following:

  1. We know that the logistic regression is finding the conditional probability of $Y$ given $X$, i.e. $P(Y = 1 \mid X)$ for the binary case.
  2. We also know that the conditional probability $Y \mid X \sim \text{Ber}(p)$ follows a Bernoulli distribution for the binary case.
  3. Now the confusion I face is, after maximum likelihood estimation, we derive a set of “optimal” parameters $\boldsymbol{\beta}$, is the parameter found the same as $p$, where $p$ is the parameter of the Bernoulli distribution? My mind is fixated that since the likelihood function of $Y$ given $X$ is Bernoulli, then we should be finding the $p$ that maximise the data.

——

An attempt to answer this: finding the $\boldsymbol{\beta}$ is equivalent to finding the $p$ for the conditional distribution of $Y$ given a certain $X$ value. So they are the same.

EDIT: To clarify my question, by the definition of maximum likelihood, we are finding the parameter that maximise the conditional distribution $Y \mid X$, which in turn follows a Bernoulli. So my state of mind is that the parameter should be $p$, but of course we ended up finding $\boldsymbol{\beta}$. I understand the logistic function which is linear in the log odds with coefficients $\boldsymbol{\beta}$, what I failed to reconcile is whether we are following the definition that the maximum likelihood is returning us the parameter $p$ or $\boldsymbol{\beta}$, or it does not matter in this context since $\boldsymbol{\beta}$ and $p$ are linked.

Best Answer

The logistic regression model is a kind of generalized linear model, so it consists of the linear predictor

$$ \eta = \boldsymbol{\beta}X $$

we pass it through the inverse of the link function $g$ (the logistic function), to obtain $p$, i.e. the conditional mean of the Bernoulli distribution

$$ E[Y|X] = p = g^{-1}(\eta) $$

since $Y$ is binary, we have

$$ Y|X \sim \mathsf{Bernoulli}(p) $$

so $\boldsymbol{\beta} \ne p$, but $g^{-1}(\boldsymbol{\beta}X) = p$. Logistic regression predicts the mean of the Bernoulli distribution.

Regarding your comment, in maximum likelihood, we are estimating the parameters $\boldsymbol{\beta}$ of our model by maximizing

$$ \hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\operatorname{arg\,max}} \; \mathsf{Bernoulli}\big(y \,|\, g^{-1}(\boldsymbol{\beta}X) \big) $$

(forgive me for the slight abuse of notation). Here $p$ is a function of $X$ and $\boldsymbol{\beta}$, rather than standalone parameter. Noting in the definition of maximum likelihood prohibits us from doing this.

Related Question