My first thought would be to regress education (using a proportional odds model or whatever is appropriate for your education variable) on person-level variables and a few simple transportation choice aggregates. The main variable that comes to mind is the proportion of train vs. bus rides (%train), but if you only have two event level variables -- distance and duration -- then another option would be %train-near, %train-far, %train-short, %train-long.

If something simple like the above won't work because you have too many event level variables or you're not willing to categorize them, then your first thought of using a logistic regression with random effects for person-level variables (I presume) is the right idea. However, I would modify your suggestion by using a structural equation model (SEM) to regress education on transportation choice, which is in turn regressed on event and person level variables (except for education) and the random effects. Education can additionally be regressed directly on the event and person level variables. All regressions are estimated simultaneously. This can be done in Mplus, but currently is not possible in R, as far as I know, because none of the SEM packages (lavaan, sem, e.g.) allow for mixed effects like those offered by the lme4 package. It can probably be done in SAS with a lot of coding. No idea about other software.

Is your second thought of regressing education on combinations of your predictors feasible given the number of combinations and amount of data? How many event and person level variables do you have?

Latent class regression wouldn't make sense for your data because individual response patterns aren't comparable (e.g. person 1 might have chosen 00 for near-short, near-short and person 2 might have chosen 0000 for far-long, far-long, far-long, far long -- you could recode response vectors with a lot of missing values, but there are better approaches).

## Best Answer

1) with case-control data (selection on $Y$), how can you use logistic regression? Is it valid? How is parameter estimates (and their interpretations) affected? 2) tell us about over-dispersion in binomial data. Why is it a problem? What is it? What can you do with it? How is logistic regression affected? Is there some alternatives to logistic regression in that case? 3) tell us about modeling of interactions in linear and logistic regression. How does it differ? How does interpretations differ in those two cases?