Solved – How to decide between a logistic regression or conditional logistic regression

clogitlogisticsurvey

I have a case-control study in which the cases are firms with health insurance and the controls are firms with no health insurance. I am studying the factors affecting enrolment in health insurance and was therefore using a logistic regression, which includes several covariates on firm characteristics that were measured in a survey. I have randomly sampled the firms from a database that includes two strata: insured and uninsured firms. I selected 65 from each group. However, within the group I also sampled from four strata that correspond to industry. I am therefore wondering if I need to use conditional logistic regression, as opposed to unconditional logistic regression. However, I was under the impression that conditional logistic regression was for matched case-control studies or panel studies. In other feedback I've received I've been told that because I sampled on the outcome, I need to use the conditional model. Could someone please help me figure out which m? Any references would also be much appreciated. Thank you.

Best Answer

I don't agree that you sampled on the outcome, since you sampled on company and enrollment is your outcome. You may want to deal with the company as a random effect and the other features as fixed effects. So I am suggesting yet a third alternative: generalized mixed models.

After clarification: If the outcome is company enrollment rather than employee enrollment, then it is an ordinary case-control study for which unconditional logistic regression should be the standard approach. Conditional logistic regression is not necessary unless there were further conditions on the sampling regarding other company features.

Further clarification: If you were using R, then the package to identify and install would be not surprisingly: "sampling" by Thomas Lumley. It provides for the appropriate incorporation of the two-way sampling strategy you have outlined in the design phase prior to estimation with the svyglm() function. Stata also has a set of survey functions and I imagine they can also be used with the general linear modeling functions it provides. SAS didn't have such facilities in the past so the SUDAAN program was needed as an added (expensive) purchase, but I have a vague memory that this may have changed with its latest releases. (I don't know about SPSS with regard to sampling support for GLM models.)