Solved – Hierarchical modelling in Python with statsmodels

multilevel-analysispythonstatsmodels

I have a dataset with random effects at different hierarchies and now I want to analyze how they influence my target variable. Somehow I'm looking into statsmodels Linear Mixed Effect Models to solve my issue. Though I can't figure out through the documentation how to achieve my goal.

To pick up the example from statsmodels with the dietox dataset my example is:

import statsmodels.api as sm
import statsmodels.formula.api as smf

data = sm.datasets.get_rdataset("dietox", "geepack").data
# Only take the last week
data = data.copy().query("Time == 12")
# Convert Vitamin E to number
data["Evit"] = data["Evit"].map(lambda s: int(s.replace("Evit", "")))
md = smf.mixedlm("Weight ~ Feed + Evit", data, groups=data["Evit"])
mdf = md.fit()
print(mdf.summary())

I want to predict the pigs weight in week 12 from the cumulated food intake Feed and vitamin E dosage Evit. The hierarchy is supposed to be groups sharing a vitamin E dose that have multiple pigs assigned to them.

I would expect to have a model that for every $Weight$ in the groups $i = 1…N$ and for every pig $j = 1…M$
$$Weight_{ij} = \beta_0 + \beta_1 Evit_{i} + \gamma_{0i} + \gamma_{1i} Feed_{ij}$$
then $\beta_0$ should be the common intercept shared among all Pigs. $\beta_1$ would be the slope that shows the effect of vitamin E. $\gamma_{0i}$ is the intercept for the group receiving the same vitamin E injections (this does not make much sense with this dataset, but it's required for my actual problem). $\gamma_{1i}$ is the groups slope for the Feed, so that it shows if vitamin dosis lead to different weight gain with the same amount of food.

I would also be interested in how to remove the intercept.

In my actual dataset the group (Evit) determines more variables (for example $Light$) that are the same within the group but different between the groups. I would also like to include these. Also there are more variables that are individual (for example $Hairiness$). So that the final model could look more like this:
$$Weight_{ij} = \beta_0 + \beta_1 Evit_{i} + \beta_2 Light_{i} + \gamma_{0i} + \gamma_{1i} Feed_{ij} + \gamma_{2i} Hairiness_{ij}$$

The question is how do I model this in statsmodels or another Python library.

Best Answer

As I understand it, your beta's are fixed coefficients and your gamma's are random coefficients. You only have one level of clustering (j within i).

Your data should be in long form (one row per observation, not one row per group).

If you are using formulas, you would have a formula for the fixed effects part of the model (the mean structure) and a formula for the random effects part of the model. You also need an indicator variable defining the groups. It would look something like this:

model = sm.MixedLM.from_formula("weight ~ evit+light", groups="pig", re_formula="feed + hairiness")
result = model.fit()
Related Question