Solved – How to justify the use of categorical variables as continuous variables in logistic regression

categorical datalogisticregression

One question again to be clarified: Can I use the variables as noted below [(3) a,b,c etc] as continuous variables in my logistic regression and if so what will be my explanation in the paper that I am writing.

I have the following sets of variables:

  1. A Categorical (binary) variable Ayurveda and Allopathy
  2. Test variable (binary) "Spirituality is a scientific subject": Agree and Disagree
  3. Then I have a number of participant perspectives/characteristics such as:

    • (a) Do you believe there is life after death: 1) yes, 2) no, 3) not sure
    • (b) To what extent do you consider spiritual 1) Very 2) moderate 3) slightly 4) not at all
    • (c) How often would you say the experience of illness increase patients’ awareness of and focus on R/S: 1) Rarely 2) Never 3) sometimes 4) Often 5) Always 6) Not apply
    • (d) etc= several more such variables with multiple choices as above

Please advise.

Best Answer

I would say no. Including these categorical variables as continuous regressors assumes that a one unit change in any of the multiple choice variables results in the same effect on the outcome. For example, you are assuming that going from [1)Very] to [2)moderate] has the same marginal effect as going from [3)slightly] to [4)not at all].

To me this is an overly restrictive assumption. Thankfully, it is straightforward to estimate this model without this assumption: include the categorical regressors as factor variables. This breaks each of the categorical regressors down into a series of dummy variables. To do this with R you can use the factor() function inside the glm function:

glm(y~factor(x),binomial())

or in stata use the xi and i prefixes:

xi:logit y i.x