Solved – Is it acceptable to use linear regression to predict a 4 point likert scale from a 7 point likert scale

likertregressionspss

My dependent variable is measured on a 4 point likert scale and independent variable is measured on a 7 point likert scale. Is it appropriate to run regression analysis on such data with varying lengths of likert scale, especially a 4 point likert scale against a 7 point likert scale.

Best Answer

It is generally fine to use predictor and outcome variables that use different metrics when performing multiple regression.

To demonstrate the point, you can rescale predictor or dependent variables using a linear transformation (.e.g., z-scores, centering, and so on) and this will not influence your $R^2$ or your standardised regression coefficients (note that I'm not saying you should do this, I'm just pointing out that this aspect of scaling is not the issue). Of course, using 4 or 7 point response scales is more than just rescaling, but from my experience, correlations and $R^2$ wont change a lot based on whether you use a 4 or 7 point scale.

That said, there several issues to consider when you have predictor or dependent variables that are single item variables with a small number of ordered response options:

  • What is the best response scale for measuring the variable of interest? If you are designing a study, then you may want to think about the optimal number of response options. There are a range of debates about this. Some people argue that you should have more response options (e.g., like a 7 or 10 point scale). Others suggest that you should align the set of response options to the meaningful distinctions that respondents are able to make, and that too many response options can lead to more person-specific anchoring effects; such arguments are often used to justify 5 point scales.
  • What is the best way to measure the variable of interest? If you truly have a single item measure on a four or seven point scale, you would often be better served by developing a scale with multiple items that you then sum to form an overall measure. This will tend to be more reliable and lead to more discrimination. Both of these factors may result in improved prediction.
  • Can you include an item with four ordered response options as a dependent variable in a linear regression? There are different answers to this. Certainly, it is possible, and many people do this. Of course the residuals wont be normally distributed, and it assumes that you are happy treating the categories of the response option as equally-distant. There are alternative techniques that attempt to more explicitly model ordinal data (such as ordinal logistic regression). In practice, as the number of categories increases, people are generally more willing to perform linear regression. Thus, if your dependent variable was the sum of a few items all on a four point scale, it would seem more appropriate. Four options on a single item is on the low-side.
  • Can you include an item with seven ordered response options as a predictor variable in linear regression? Yes, this is fine. There are a many options regarding how you numerically code the variable. The standard approach would be to treat the categories as equally distant. Of course, you could explore other codings (there's even optimal scaling which attempt to optimise the coding of the variable subject to any constraints such as ordinality). Or you could include both a linear and quadratic coding for the variable to incorporate non-linearity of effect.

Note most of the above was written on the initial assumption that your predictor and outcome variable were single items. If you have multi-item scales that just happen to use different response scales, then there's not too much to think about. Most people treat such scales as standard numeric variables in their multiple regressions.