Logistic Regression – Is the i.i.d. Assumption Required for Logistic Regression?

assumptionsiidlogisticregression

Is there i.i.d. assumption on the response variable of logistic regression?

For example, suppose we have $1000$ data points. It seems the response $Y_i$ is coming from a Bernoulli distribution with $p_i=\text{logit}^{-1}(\beta_0+\beta_1 x_i)$. Therefore, we should have $1000$ Bernoulli distributions, with different parameter $p$.

So, they are "independent", but are not "identical".

Am I right?


PS. I learned logistic regression from "machine learning" literature, where we optimize the objective function and check if it is good in testing data, without talking too much about assumptions.

My question started with this post Understand Link Function in Generalized Linear Model Where I try to learn more on statistical assumptions.

Best Answer

From your previous question you learned that GLM is described in terms of probability distribution, linear predictor $\eta$ and link function $g$ and is described as

$$ \begin{align} \eta &= X\beta \\ E(Y|X) &= \mu = g^{-1}(\eta) \end{align} $$

where $g$ is a logit link function and $Y$ is assumed to follow a Bernoulli distribution

$$ Y_i \sim \mathcal{B}(\mu_i) $$

each $Y_i$ follows Bernoulli distribution with it's own mean $\mu_i$ that is conditional on $X$. We are not assuming that each $Y_i$ comes from the same distribution, with the same mean (this would be the intercept-only model $Y_i = g^{-1}(\mu)$), but that they all have different means. We assume that $Y_i$'s are independent, i.e. we do not have to worry about things such as autocorrelation between subsequent $Y_i$ values etc.

The i.i.d. assumption is related to errors in linear regression (i.e. Gaussian GLM), where the model is

$$ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i = \mu_i + \varepsilon_i $$

where $\varepsilon_i \sim \mathcal{N}(0, \sigma^2)$, so we have i.i.d. noise around $\mu_i$. This is why are interested in residuals diagnostics and pay attention to the residuals vs. fitted plot. Now, in case of GLM's like logistic regression it's not that simple since there is no additive noise term like with Gaussian model (see here, here and here). We still want residuals to be "random" around zero and we don't want to see any trends in them because they would suggest that there are some effects that are not accounted for in the model, but we don't assume that they are normal and/or i.i.d.. See also the On the importance of the i.i.d. assumption in statistical learning thread.

As a sidenote, notice that we can even drop the assumption that each $Y_i$ comes from the same kind of distribution. There are (non-GLM) models that assume that different $Y_i$'s can have different distributions with different parameters, i.e. that your data comes from a mixture of different distributions. In such case we would also assume that the $Y_i$ values are independent, since dependent values, coming from different distributions with different parameters (i.e. typical real-world data) is something that in most cases would be too complicated to model (often impossible).