Solved – Clustered data in logistic regression analysis

categorical datacluster-sampleclustered-standard-errorslogisticregression

I am doing a logistic regression analysis, for an election setting. My dependent variable is voting for the incumbent candidate (Yes/No) and my independent variable is perceived economy (data from a survey) — the model is controlling for gender, age etc..

I am working with data from two specific elections, which doesn't allow for a multilevel regression analysis. I have to now have to choose between a dummy variable for the elections (fixed effects) or clustered standard errors to account for the clustering at the two elections.

I am a little confused now about setting up a dummy variable for the elections. What would you recommend and how would you go about doing that ?

Best Answer

The first thing you should do is figure out why you want to have one model fit to data from the two elections. Do you think these two elections are representative of and generalize-able to other elections? Or are you just curious about them in isolation? You should pick which election you would like to be listed in the regression summary and set that to 1.

There will always be strong unobserved/able variable bias in these types of models because it is hard to know what is impacting voter perceptions, so fixed-effects at a local level, such as MSA or zip code, might help control for some of that (or state-level if you are talking about presidential elections).

What statistical software are you using? SAS, Stata, R? Implementing fixed-effects just requires specifying the level at which you would like to perform the analysis. Regardless, you should probably report the original and fixed-effects next to each other for comparison purposes.

Related Question