psychometrics – How to Validly Reduce the Numbers of Items in a Published Likert-Scale

likertpsychometricsreliabilityscales

[edits made in response to feedback- thanks 🙂 ]

Doh! More edits! Sorry!

Hello-

I am doing some rather rough and ready data collection with a survey sent out to healthcare staff using a published scale about morale and other such issues.

The only thing is that the scale is rather long with all the other things in the survey and I would like to reduce its size by cutting each subscale in half and only using half the items. My intuition is that this is fine, since the subscales are inter-correlated, and while it's not ideal for publication-standard research, it's okay just for a bit of intra-organisational fact finding.

I wondered if anyone had any thoughts on the validity of doing this, pitfalls, or anything else. References particularly are gratefully received because my colleagues will need some convincing!

Many thanks,
Chris B

edits-

Yes it is a validated scale with known psychometric properties.

It's unidimensional and it has subscales, if that's the right way to put it.

I'll be working at the subscale and total, not the item, level.

30 items, probably about 40-60 individuals.

Cheers!

Best Answer

Although there is still some information lacking (No. individuals and items per subscale), here are some general hints about scale reduction. Also, since you are working at the questionnaire level, I don't see why its length matters so much (after all, you will just give summary statistics, like total or mean scores).

I shall assume that (a) you have a set of K items measuring some construct related to morale, (b) your "unidimensional" scale is a second-order factor that might be subdivided into different facets, (c) you would like to reduce your scale to k < K items so as to summarize with sufficient accuracy subjects' totalled scale scores while preserving the content validity of the scale.

About content/construct validity of this validated scale: The number of items has certainly been choosen so as to best reflect the construct of interest. By shortening the questionnaire, you are actually reducing construct coverage. It would be good to check that the factor structure remains the same when considering only half of the items (which could also impact the way you select them, after all). This can be done using traditional FA techniques. You hold the responsability of interpreting the scale in a spirit similar to that of the authors.

About scores reliability: Although it is a sample-dependent measure, scores reliability decreases when decreasing the number of items (cf. Spearman-Brown formula); another way to see that is that the standard error of measurement (SEM) will increase, but see An NCME Instructional Module on Standard Error of Measurement, by Leo M Harvill. Needless to say, it applies to every indicator that depends on the number of items (e.g., Cronbach's alpha which can be used to estimate one form of reliability, namely the internal consistency). Hopefully, this will not impact any between-group comparisons based on raw scores.

So, my recommendations (the easiest way) would be:

  1. Select your items so as to maximise construct coverage; check the dimensionality with FA and coverage with univariate responses distributions;
  2. Compare average interitem correlations to previously reported ones;
  3. Compute internal consistency for the full scale and your composites; check that they are in agreement with published statistics on the original scale (no need to test anything, these are sample-dependent measures);
  4. Test the linear (or polychoric, or rank) correlations between original and reduced (sub)scores, to ensure that they are comparable (i.e., that individuals locations on the latent trait do no vary to a great extent, as objectivated through the raw scores);
  5. If you have an external subject-specific variable (e.g., gender, age, or best a measure related to morale), compare known-group validity between the two forms.

The hard way would be to rely on Item Response Theory to select those items that carry the maximum of information on the latent trait -- scale reduction is actually one of its best application. Models for polytomous items were partly described in this thread, Validating questionnaires.

Update after your 2nd update

  1. Forget about any IRT models for polytomous items with so few subjects.
  2. Factor Analysis will also suffer from such a low sample size; you will get unreliable factor loadings estimates.
  3. 30 items divided by 2 = 15 items (it's easy to get an idea of the increase in the corresponding SEM for the total score), but it will definitively get worse if you consider subscales (this was actually my 2nd question--No. items per subscale, if any)