Solved – Logistic regression with binomial data in Python

logisticpythonregressionstatsmodels

This is probably trivial but I couldn't figure it out. I want to fit a logistic regression model, where my dependent variable is not a Bernoulli variable, but a binomial count. Namely, for each $X_i$, I have $s_i$, the number of successes, and $n_i$, the number of trials. This is completely equivalent to the Bernoulli case, as if we observed these $n_i$ trials, so in principle I can use, e.g., statsmodels logistic regression after I unravel my data to be Bernoulli observations. Is there a simpler way?

Best Answer

The statsmodel package has glm() function that can be used for such problems. See an example below:

import statsmodels.api as sm

glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial())

More details can be found on the following link. Please note that the binomial family models accept a 2d array with two columns. Each observation is expected to be [success, failure]. In the above example that I took from the link provided below, data.endog corresponds to a two dimensional array (Success: NABOVE, Failure: NBELOW).

Relevant documentation: https://www.statsmodels.org/stable/examples/notebooks/generated/glm.html

Related Question