Multinomial Logit – Understanding Interaction Between Alternative-Specific and Individual-Specific Variables in Conditional Logistic Regression

choice-modelinginteractionmlogitmultinomial logit

I am interested in the moderating effect of a voter's political knowledge on the effect of a voter's candidate evaluations on vote choice. I am trying to estimate conditional logit models to analyze this. At least that is what they are called in political science, I have already noticed that the terminology here is not entirely consistent across different disciplines.

All voters in my dataset have the same choice set of 6 parties. For my analysis I am using a stacked data matrix where every row constitutes a voter*party dyad. I would like to estimate an interaction effect between political knowledge (individual-specific) and candidate evaluation (alternative-specific) on vote choice. My problem is that, at least to my knowledge from trying to do this with the mlogit package in R, it is not possible to include variables like political knowledge in these models, as there is no intra-individual variation. Political knowledge of voter A is the same for party 1, party 2, …, party 6. It is constant per individual, same as is age, gender, etc.

Of course, I am now asking myself how I can properly model this. I have read some papers that get around this issue by regressing vote choice for every single party upon the individual-specific variable and using the predicted probabilities (y-hats) from this binary logistic model as an independent variable in the conditional logistic regression (see for example Gattermann & De Vreese 2017; Giebler & Wagner 2015). However, these papers do not use y-hats as a constituent factor of an interaction effect, and I have no idea if this would even be valid or how I would interpret such an interaction effect.

I have also found a paper that calculates only the interaction effect and excludes from the model the main effect of one of the factors constituting the interaction effect. I know that best practice suggests to include all variables in the model (especially since it makes sense to expect a direct effect of political knowledge on vote choice) but maybe this is negligible in this case.

I would really appreciate feedback or guidance in this matter, as I am a bit stuck right now and I don't feel confident enough to choose one of the above mentioned alternatives. Maybe someone has a good argument for one of these alternatives, or maybe there are even other ways to solve this.

Feel free to let me know if you want me to describe the problem in more detail. Thank you!

EDIT1, 2022-03-11: I was asked to further specify my question. The dependent variable "vote choice" is a dummy, coded 1 when a respondent indicated voting for this party and 0 otherwise. Example data:

resp_id	party	vote_choice	cand_eval	pol_know
1	CDU/CSU	0	0.2	0.904761
1	SPD	0	0.5	0.904761
1	Greens	1	0.8	0.904761
1	FDP	0	0.3	0.904761
1	Left	0	0.6	0.904761
1	AfD	0	0.1	0.904761
2	CDU/CSU	0	0.1	0.761904
2	SPD	0	0.4	0.761904
2	Greens	0	0.6	0.761904
2	FDP	0	0.2	0.761904
2	Left	1	1.0	0.761904
2	AfD	0	0.0	0.761904

The vote choice is the only choice asked of the respondents and every respondent in the dataset made this choice. There are no cases who did not choose a party. The parties are not ranked, respondents were simply asked what party they would vote for (1 vote for 1 party). For every party, the evaluation of their one lead candidate was asked. There was no choice of candidates within a party.

Apart from the proposed interaction of political knowledge and candidate evaluation, I would like to include standard predictors of vote choice, i.e. an individual's party identification (dummy variable 0 if not feeling close to respective party, 1 if feeling close to party), distance between the individual's position on the left-right dimension and the party position on the left-right dimension (continuous variable), both predictors are alternative-specific. I would also like to include controls for sex (dummy m/f), age (continuous) and income (continuous), all individual-specific.

Best Answer

This seems to be a pretty straightforward multinomial logistic regression problem. You have 6 mutually exclusive unordered outcomes (the vote choice) and a set of predictors, evidently with complete data. The regression would directly evaluate the log-odds of choosing each of 5 parties against one party chosen as a reference, as a function of your predictor values (essentially 5 binary logistic regressions done together). You can then express results as probabilities for each of the 6 parties.

From the data sample you show, you might need to put your data into a wide form, with 1 row per individual.* The outcome would be the party choice, represented as an unordered categorical variable. In the process you would place the cand_eval values currently in separate rows for each party and individual into separate columns, one specific to each of the 6 parties, for each individual. Then you could examine the interaction between those party candidate evaluation metrics and political knowledge via interaction terms. Even though the political knowledge measure is constant for each individual, those interaction terms (and the candidate evaluations) wouldn't be. Measures that combine information from all of the 6 cand_eval values for each individual could also be considered as predictors.

This UCLA website has links to how to perform multinomial logistic regression with any of 5 different software packages.

*There might be a way to do the multinomial logistic regression directly with the data as currently formatted, but I'm not sure. Moving from long-form data into wide form is now nicely supported by tools in the R tidyverse.

Why would you want to do this?

To begin with, variables like cost and travel time vary across alternatives, which actually makes them generic variables. The multinomial logit model is defined on the difference between two utility functions. Say you have two alternatives (1 and 2), where $x$ is the same in both. If you difference the utility equations, $\alpha_1 + \beta x - [\alpha_2 + \beta x]$ will cancel out $\beta$. So you estimate them as alternative-specific parameters $\alpha_1 + \beta_1 x - [\alpha_2 + \beta_2 x]$, allowing them to be defined.

But if $x_1$ and $x_2$ are different, the equation is identifiable with a single $\beta$, $\alpha_1 + \beta x_1 - [\alpha_2 + \beta x_2]$

If you estimate this model (I'm only going to use one travel time)

$$ U_{train} = \alpha_{train} + \beta_{tt} (TT_{train})$$ $$ U_{auto} = \alpha_{auto} + \beta_{tt} (TT_{auto})$$

You estimate three parameters ($\alpha_{train}, \alpha_{auto}, \beta_{tt}$). If you insist on estimating alternative-specific parameters for $\beta_{tt:auto}, \beta_{tt:train}$, you have spent a degree of freedom estimating a parameter that you don't actually need, which makes your model less efficient, with knock-on consequences for your hypothesis tests. Not to mention that cross-elasticities are lots easier to calculate with generic coefficients...

You only need alternative specific coefficients if you have variables that don't vary across alternatives, like if you had income, or the distance between the start and end of the trip.

Okay, you want to do this anyways, so how do you do it?

There are a couple of ways that I might do this. First I'm going to build a simple dataset from the Biogeme swissmetro dataset.

library(foreign)
swissmetro <- read.delim("~/Downloads/swissmetro.dat")

library(dplyr)
swissmetro <- swissmetro %>% 
  filter(CHOICE %in% c(1, 3)) %>%
  mutate(choice = factor(CHOICE, labels = c("train", "car")),
         tt.train = TRAIN_TT, tt.car = CAR_TT, 
         cost.train = TRAIN_CO, cost.car = CAR_CO) %>%
  select(ID, choice, tt.train, tt.car, cost.train, cost.car)


library(mlogit)
sm <- reshape(swissmetro, varying = 3:6, direction = "long")
sm <- sm %>% 
  mutate(choice = ifelse(choice == time, TRUE, FALSE), alt = time) %>%
  arrange(id) %>% select(-time)
sm.mlogit <- mlogit.data(sm, choice = "choice", id.var = "ID", alt.var = "alt",
                         shape = "long")

head(sm.mlogit)
##         ID choice  tt cost id   alt
## 1.train  1   TRUE 103   36  1 train
## 1.car    1  FALSE  90   65  1   car
## 2.train  7   TRUE  80   42  2 train
## 2.car    7  FALSE  72  140  2   car
## 3.train  8  FALSE 100   22  3 train
## 3.car    8   TRUE  80   24  3   car

This replicates (what I think is) your data pretty well, though with only one travel time variable. We can and should treat both tt and cost as generic, giving us the most efficient model,

mnl1 <- mlogit(choice ~ tt + cost, data = sm.mlogit)
summary(mnl1)

## 
## Call:
## mlogit(formula = choice ~ tt + cost, data = sm.mlogit, method = "nr", 
##     print.level = 0)
## 
## Frequencies of alternatives:
##   car train 
## 0.684 0.316 
## 
## nr method
## 6 iterations, 0h:0m:0s 
## g'(-H)^-1g = 2.11E-05 
## successive function values within tolerance limits 
## 
## Coefficients :
##                    Estimate Std. Error t-value Pr(>|t|)    
## train:(intercept) -1.33e+00   4.56e-02  -29.16   <2e-16 ***
## tt                -2.57e-04   4.69e-04   -0.55     0.58    
## cost               1.40e-03   5.92e-05   23.71   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -2180
## McFadden R^2:  0.225 
## Likelihood ratio test : chisq = 1260 (p.value = <2e-16)

Attempt 1: intercept interaction

My first idea is to interact the generic variables with an intercept, which is the same as forcing your missing $\beta_{train}$ to equal $0$.

sm.mlogit$car <- ifelse(sm.mlogit$alt == "train", 0, 1)
head(sm.mlogit)

##         ID choice  tt cost id   alt car
## 1.train  1   TRUE 103   36  1 train   0
## 1.car    1  FALSE  90   65  1   car   1
## 2.train  7   TRUE  80   42  2 train   0
## 2.car    7  FALSE  72  140  2   car   1
## 3.train  8  FALSE 100   22  3 train   0
## 3.car    8   TRUE  80   24  3   car   1

mnl2 <- mlogit(choice ~ tt + I(cost*car), data = sm.mlogit)
summary(mnl2)

## 
## Call:
## mlogit(formula = choice ~ tt + I(cost * car), data = sm.mlogit, 
##     method = "nr", print.level = 0)
## 
## Frequencies of alternatives:
##   car train 
## 0.684 0.316 
## 
## nr method
## 6 iterations, 0h:0m:0s 
## g'(-H)^-1g = 1.29E-06 
## successive function values within tolerance limits 
## 
## Coefficients :
##                   Estimate Std. Error t-value Pr(>|t|)    
## train:(intercept) 1.01e+00   7.58e-02   13.30   <2e-16 ***
## tt                7.21e-05   4.72e-04    0.15     0.88    
## I(cost * car)     2.84e-02   1.04e-03   27.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -2230
## McFadden R^2:  0.205 
## Likelihood ratio test : chisq = 1150 (p.value = <2e-16)

Solved – interaction term in conditional logistic regression

As stated here (http://www.ats.ucla.edu/stat/stata/library/sg124.pdf), interaction or effect modification...is performed by including and evaluating the significance of second or higher order terms involving the two or more variables that are postulated to possibly modify their respective effects.

I´m trying something similar with clogit() in R but have not found much info about it in the web, except this link (https://stackoverflow.com/questions/20977401/coxph-x-matrix-deemed-to-be-singular) talking about the problems/errors found when using interaction terms with the function coxph(). Since clogit() is a wrap-up of coxph(), I thought this could be useful.

Best Answer

Related Solutions

Solved – Alternative Specific Variables in R

Why would you want to do this?

Okay, you want to do this anyways, so how do you do it?

Attempt 1: intercept interaction

Solved – interaction term in conditional logistic regression

Related Question