Solved – Logistic regression with grouped data

logisticpython

Likely a simple question, have searched for an answer I promise!

I'm looking to do a Logistic regression for a dataset in which data is grouped by an ID, where there is one positive flag per group and the groups vary in size. A simple example would be a prediction of who is the heaviest in a group of individuals, given their age and height:

| GroupId | Age | Height | Heaviest |
|---------|-----|--------|----------|
| 1       | 27  | 198    | 1        |
| 1       | 42  | 165    | 0        |
| 1       | 34  | 133    | 0        |
| 2       | 63  | 176    | 1        |
| 2       | 27  | 189    | 0        |
| 2       | 55  | 165    | 0        |
| 2       | 44  | 166    | 0        |

My question is how we can leverage the grouping information into a logistic regression, as many of the positive flags will be on individuals who shouldn't be characterised as 'Heaviest' in a global sense, or am I thinking about it the wrong way? For what it's worth, I'm working with python's statsmodels library.

Best Answer

So after searching through leads kindly provided by this thread I've concluded that a Cox proportional hazards model would probably be the most appropriate, as this allows for stratification of the data by an ID as above.

For the curious, I came across lifelines for python which has a good implementation and have been doing some moderately successful tests with it.

Thanks all!

Related Question