Solved – Scale variable as count data – correct or not

count-datanegative-binomial-distributionregression

In this paper (freely available via PubMed central), the authors use negative binomial regression to model the score on a 10-item screening instrument scored 0-40. This procedure assumes count data, which clearly isn't the case here. I'd like your opinions on whether this approach is acceptable, because I sometimes use the same instrument or similar ones in my work. If not, I'd like to know if there are any acceptable alternatives. More details below:

The scale used is the Alcohol Use Disorders Identification Test (AUDIT), a 10-item questionnaire designed as a screening instrument for alcohol use disorder and hazardous/harmful drinking. The instrument is scored from 0 to 40, and the results are typically heavily left-skewed.

To my understanding, using count data assumes that all values that are "counted" are independent of each other – patients coming to an emergency ward each day, number of fatalities in a certain group, etc – they're all independent from each other, though dependent on underlying variables. Furthermore, I think there cannot be a maximum allowed count when using count data, though I think that this assumption can be relaxed when the theoretical maximum is very high when compared to the observed maximum in the data?

When using the AUDIT scale, we do not have a true count. We have 10 items with a maximum total score of 40, though that high scores are rarely seen in practice. The scores on the items are naturally correlated with each other.

The assumptions required to use count data are thus violated. But is this still an acceptable approach? How serious are the violations of the assumptions? Are there certain circumstances under which this approach can be considered more acceptable? Are there any alternatives to this approach that doesn't involve reducing the scale variable to categories?

Best Answer

The AUDIT instrument is essentially a Likert scale. A set of questions (Likert items), with answers often on a five-point scale, is designed to get at some underlying phenomenon. The sum of responses to the set of questions, the Likert scale, is then used as the measure of the underlying phenomenon. Although Likert items are often on a scale of "strongly disagree" to "strongly agree," the application to measure a tendency toward "Alcohol Use Disorders" in this "Identification Test" is straightforward.

As noted in the Likert scale Wikipedia page, "Whether individual Likert items can be considered as interval-level data, or whether they should be treated as ordered-categorical data is the subject of considerable disagreement in the literature, with strong convictions on what are the most applicable methods." This dispute probably dates back through most of the 80+ years since Likert first proposed the scale: is each step along the scale equivalent, both within and among the items that make up the scale? The issue has been addressed on Cross Validated, as in answers to this question, one of the earliest questions asked on this site.

If you accept the idea that the scale has steps that are uniform (or close enough to uniform for the application at hand, perhaps averaged out by adding 10 different items, as in AUDIT), then several approaches to analysis are possible. One is to consider the response on the scale as a series of steps chosen or not chosen to move up the scale, with the same probability of moving up each of the steps.

This allows one to think of "n-point Likert scale data as n trials from a binomial process," as in a 2010 question from @MikeLawrence. Although responses to that question were not terribly supportive of that idea, it was not hard to quickly find today a 2014 study that used and extended this approach successfully to distinguish sub-populations with different binomial probablilities. Although a binomial process is often used to model count data, it thus can be used to model the number, the count, of steps that an individual took along the scale of "Alcohol Use Disorders."

As @Scortchi noted in an answer to the question linked in the second paragraph, a limitation of the binomial model is that it imposes a particular relation between the mean and the variance of the response. The negative binomial removes that restriction, with loss of the easy interpretation provided by the simple binomial model. In the analysis, the extra parameter that needs to be fit uses up just one additional degree of freedom. In contrast, trying to specify different probabilities for each of the 40 Likert-item steps and their sum into the Likert scale would be daunting.

As @MatthewGraves noted in his answer to this question, whether the negative binomial model is appropriate is best answered by examining the residuals. In the original study that developed AUDIT, a value of 8 or more on the 40-point scale had quite reasonable specificity and sensitivity for distinguishing those diagnosed for "hazardous or harmful alcohol use," across 6 different countries. So perhaps a two-population binomial model based on high-risk and low-risk populations, similar to the 2014 study linked above, would be better.

Those interested in AUDIT specifically should examine that original study. For example, although the need for a morning drink might seem to measure something completely different from the frequency of drinking, as @SeanEaster surmised, morning drinking has a weighted mean correlation of 0.73 with a scale of measures of alcohol intake. (That result is not surprising to someone who has had friends with alcohol use disorders.) AUDIT seems to be a good example of the tradeoffs needed to develop an instrument that can be used reliably across multiple cultures.

Related Question