Solved – Logistic regression, SPSS ignores the reference category and assumes another one

categorical datalogisticspssstepwise regression

I am modelling logistic regressions in SPSS, the same model for different countries (well, with slight differences in the independent variables set due to collinearity diagnosis and stepwise results). The model seems to work fine for most countries. In two countries, I am having some issues with the same variable. The variable has three categories (hierarchical). In the output for those two beautiful countries, seems that SPSS take the reference category out, assumes other as reference category and gives exp(B) for the last category.

Is there something wrong going on? What should / can I do about it?

Thanks a lot for your answers!
The problem cannot be in the syntax since I am using the same for all countries and works perfectly.

I am using ENTER. The STEPWISE was used as an exploratory method to identify the strongest predictors. According to the -2LL values my variable is not a 'strong' one for this two countries (0.12%). Is not the first time that I read that STEPWISE is a bad choice… I will that in account!

Nevertheless, why do SPSS does that?
Is this a real problem, meaning something that I must solve?
Or can I use/report the odds for those two countries without that category adding a footnote?
Should I run the model without that variable (theorectically the variable is not that important).

Thanks 😉

Hi, All my variables are recoded to the reference category be 1 and most of my variables have three categories. I am using Enter with the following syntax. LOGISTIC REGRESSION VAR= "DV"
/METHOD=ENTER "IV's"
/CONTRAST ("IV")=Indicator (1) […]
/PRINT=GOODFIT CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
Do you have idea what is going wrong?

Best Answer

I suspect your problem is you are using stepwise model selection. Automatic model selection algorithms, such as stepwise selection, don't have any 'knowledge' about what the variables mean or how they ought to relate to each other. For example, it is standard advice that if you include a squared term (e.g., $X_j^2$) in a model, you should include the lower level term ($X_j$) as well, even if it's not 'significant'. However, the selection algorithm doesn't know there is any relationship between the two variables, and so you commonly have situations where it will throw out the lower level term but retain the squared term. Likewise, all levels of your categorical variable should be retained or excluded together, but the stepwise algorithm doesn't necessarily 'know' that. I believe this is your problem.

Of course, even if you were to address this issue, stepwise selection algorithms are a truly horrible modeling strategy, and still should not be used. For more on that topic, it may help you to read my answer here: algorithms-for-automatic-model-selection.

As the above suggests, the way to rectify this problem is to stop using stepwise selection. Aside from the problem described in the post, it does not do what people believe it does (i.e., tell them which variables are the 'real' or 'important' ones).

Based on your edit, I'm guessing there may be a different issue. It is possible that you don't have any observations (cases) with those levels of the categorical variable for those countries. You should check to see if that is the case. If so, see if one of the other levels of the categorical variable has instances for every country, and use that one as your reference level instead.

Best Answer

Related Solutions

Solved – How to interpret logistic regression output for categorical variables when two categories are missing

Multinomial Logistic Regression vs One-vs-Rest Binary Logistic Regression

Related Question