Solved – Time varying predictors at higher aggregation levels in multilevel survival analysis

multilevel-analysisrandom-effects-modelsurvivaltime-varying-covariate

The case: I am trying to estimate event history models (also known as survival models) with time-varying predictors at two different levels of (geographical) aggregation. More precisely, I am using a discrete time event history model (logit model on stacked data) to predict the odds of outmigration (mig) at the household-level. Each household is exposed to the hazard of migration over a certain period (in this example three years; exposure). I have a number of time-varying (e.g., wx = cumulative working experience of household head) and time invariant household-level predictors (e.g., fem = household head is female) to control for the effect of varies socio-demographic on the decision to migrate. However, the households in my sample are located in different municipalities (MunID). In my research I am interested in how a set of time-varying characteristics of the environment (Env1, e.g. rainfall decline) that operate at the municipality-level impact the odds of household-level outmigration. However, I also need to control for some time-invariant municipality-level characteristics (Env2, e.g., % land used for agricultural production). A simplified example of the data structure is presented in the below table.

enter image description here

The problem: Because I have two levels of aggregation (households clustered in municipalities), I was intending to use logistic multilevel models. However, I am not quite sure how to correctly specify my levels. I am using R and the lme4 package to estimate the multilevel models.

Possible solutions:

  1. Courgeau (2007) describes a multilevel event history model with three levels: Time (level-1) is nested within individuals (level-2), who are nested within states (level-3). However, Courgeau only mentions a time-invariant state-level predictor. In my case, I have the problem that a time-varying predictor at the municipality-level (e.g., Env1) would not be recognized by the model as operating at the municipality-level because the values within each aggregation unit vary across time. However, the standard errors of the estimate for Env1 will be biased if the model considers this variable as a level-1 predictor because at each time point all households within one municipality will have the same Env1 value.

  2. As another option, I could use the combined MunIDy variable to specify my third-level. MunIDy combines the municipality ID (MunID) with the exposure year variable (exposure) and results in n=3*2=6 aggregation units at level-3. However, this solution seems to be also less ideal since, each level-2 unit would contain only household and municipality level values for one exposure year (e.g., one unit would consists of all cases/observations in a particular exposure year and a particular community), and I am not sure if this would cause problems for the event history model.

Does anyone have an idea of how to correctly specify the levels in my analysis so that I can investigate the effect of time-varying predictors at level-3? Or can anyone point me to published work that uses a multi-level event history analysis with time-varying predictors at higher aggregation levels? Thanks a lot for any help!

References:
Courgeau, D. (2007). Multilevel synthesis: From the group to the individual. Dordrecht, The Netherlands: Springer.

Best Answer

I think I found a solution. I read two book chapters about multilevel event history models (Courgeau, 2007; Goldstein, 2011), which discuss similar cases and suggest using a three-level structure such as time (level-1) nested within households (level-2), which are in turn nested within municipalities (level-3). Goldstein (2011, p. 221) explicitly states for this structure that “The exploratory variables can be defined at any level. They may also vary over time, allowing so-called time varying covariates.”

So here is a quick explanation why I think that such a three-level model is able to correctly incorporate time-varying predictors at the municipality-level (level-3), such as the environmental variable “Env1”. Because Env1 varies across time, the model automatically treats it as a level-1 variable. It does not know that at each time step (e.g., year 1990), the values for Env1 are the same for all households located in a particular municipality. However, I don’t think that this biases the standard errors for the Env1 variable because I have household random effects (level-2) included in the model, which estimate a separate intercept for each household. Moreover, I also include an additional variance component at level-2 that allows the slope of Env1 to vary randomly across households. In this way the effect of Env1 is uniquely computed for each household.

References:

Courgeau, D. (2007). Multilevel synthesis: From the group to the individual. Dordrecht, The Netherlands: Springer.

Goldstein, H. (2011). Multilevel statistical models (4th ed.). Chichester, U.K.: John Wiley & Sons.

Related Question