Solved – Correlational study or ordinal data using 5-point Likert scale

correlationlikertordinal-dataspearman-rho

How do I calculate the correlation using ordinal data based on a 5-point Likert scale associating perioperative education to patient satisfaction scores? A numerical value (1: strongly agree – 5: strongly disagree) will represent the participant's perception of satisfaction as a patient as related to perioperative educational material. The numerical value is the patient's response to 5 specific statements.

The Likert scale is being used during a telephone survey to gather patient responses to 5 statements regarding educational material provided during his or her surgical experience. An example of the statement may appear as, "The day surgery nurse provided clear and easy to understand verbal instruction regarding personal care once at home." The goal is to correlate patient satisfaction with the surgical experience to patient education received by the surgical team of nurses.

Question: What is the best way to analyze this data? Would Spearman's rank correlation be appropriate?

Best Answer

It depends on how much (many?) data you have, how much tolerance for complexity, and how much interest in accuracy. Some will say that that treating everything as continuous is A-OK (generally also assuming normal distributions if correlations are to be interpreted substantively or tested against a null hypothesis), but others insist this is improper, and the latter group is the more technically correct. Since you say this is telephone survey data, it seems plausible that you might have hundreds or even thousands of observations. If so, you may have sufficient statistical power to do things the "right" way.

The right way to begin is by developing a measurement model for your latent variable. Five statements rated on a common five-point Likert scale is just barely enough to satisfy more lenient rules of thumb for deciding whether one can get away with applying classical test theory (CTT) assumptions:

  • The rating frequencies for any given item are symmetrical around an equivalent mean / median / mode. Hence the ratings approximate a polychotomized normal distribution sufficiently to assume a latent, continuous, normal distribution in the construct being measured by the given item.
  • All items measure the same, single, continuous, normally distributed, latent construct equally well. Therefore the mean of these items' ratings is the score for this latent factor.

These assumptions imply ways in which CTT can fail:

  • Extreme responses may be popular on some items, especially if they are unusually easy or hard to agree with. Certain populations may also tend to give more extreme responses in certain contexts; see the first paragraph of my immediately previous answer and its links. If these tendencies don't cancel out across items, your factor score will distribute less normally.
  • Items may not all relate equally to the latent construct they represent. This is practically guaranteed to some potentially non-negligible degree, unless the latent construct is defined by its items. In that case it might be better to understand it as an emergent construct (e.g., SES), i.e., one caused jointly by its components rather than one that causes variation in manifest indicators. If the mean is an operationalization of the latent construct (not the definition of an emergent construct), and you don't have reason to think that your items contribute systematically unique information to your estimate of the latent factor, then those with lower item-total correlations are probably measuring your latent construct with more error. Poorer indicators shouldn't have the same weight as more reliable indicators in your calculation of the factor score. Using the simple mean of your items fails to adjust item weighting for imbalances in measurement error.
  • Items may measure more than one construct (collectively or even individually). To whatever extent this is known and can be controlled statistically, it's better to separate systematic variation due to other "nuisance" constructs from the variation used to estimate the latent factor of primary interest. This is another good way to remove measurement error that CTT doesn't utilize. For instance, your example item may measure attitudes toward both the nurse and the instructions. If you had more items involving attitudes toward the nurse (though I doubt you do), you might be able to fit a bifactor model that would estimate how much variance in this item's responses is due to each of these latent attitudinal influences, thus controlling for nurse-specific attitudes.

If you have at least a couple hundred observations, some interest in the statistical process, and want to improve the validity and richness of your results, try fitting a rating scale model to your five items. This is an item response theory model that assumes ratings of all items use the same scale (hence it uses the same threshold estimates for all items) and are influenced by the same latent variable (which is probably all you can estimate with five items). It can be used to generate a continuously distributed estimate of the latent variable that accommodates the ordinal nature of Likert ratings, uses only the common variance in your items (thereby excluding any item-specific measurement error), and weighs the items according to how much each has in common with the others.

You can produce factor scores for individuals using a rating scale model and then correlate those to your other variable, or you can fit a structural equation model that estimates the correlation as well as all of the items' thresholds, loadings, and unique variances, and the entire model's goodness of fit. For more info and some other alternatives, see "Factor analysis of questionnaires composed of Likert items" and "Regression testing after dimension reduction". The best choice will depend on the nature of the other variable you want to correlate your latent construct to, which as far as I can tell, you haven't specified...and again, on how large your sample is. Complex models take more data because they estimate more parameters, but they can provide more valid, precise estimates, and tell you much more about your data. The worst-case scenario would probably be having data that violates CTT assumptions and too little of it to fit the appropriate model, so at least check that this doesn't apply.