Binary Data – Guidelines for Converting Ordinal Variable to Binary Variable

binary dataordinal-data

I have seen some people convert their ordinal variable to a binary one, especially in the public opinion literature. For instance, when there is a four-scale question with responses including "Strongy agree," "Agree," "Disagree," and "Strongly disagree," some authors simply code "strongly agree" and "agree" as 1, the two other responses 0. Then they use a logistic regression model to analyze their data. I am not sure why they simply don't use their ordinal data with an ordered logistic model.

When do we need to convert our ordinal data to binary one?

Best Answer

While I am not sure there is ever a time that one needs to convert ordinal data to binary, there are times when it may be more appropriate.

First, the authors may simply choose to opt for a simpler model. That is to say, a logistic model is easier to run and analyze than is an ordinal model. Also, fewer assumptions to be tested. Following from this, it is also easier to explain the results of the logistic model compared to the trying to explain the results for an ordinal model. Of course, one can argue that the better model to fit the data should be run...but I have had editors and reviewers ask me to scale back to an easier model to accommodate the general audience of the journal. So, it is always good to keep the audience in mind.

Second, it may be a matter of cell sizes. For example, if you have very few responses in one or more categories, it may make the model estimation difficult or unstable. One work around is to collapse neighboring categories into a single category. For example, if there are too few strongly-disagree responses, then combining the D + SD categories into one group may be beneficial for computational purposes. Note, this is a strategy that is often employed in item response theory (IRT) for ordinal data.

Third, there may be a theoretical/conceptual motivation for collapsing the data. For example, if you are working under the assumption that there are response styles present in the data (people responding to the response scale provided in different fashions), then you may argue that the only "true" distinction is between whether someone agrees (to any degree) or disagrees (to any degree). Thus, the research question most likely focusses only on that comparison, and not degrees of difference in the comparison.

I hope this is helpful.

Related Question