Solved – References for creating neuropsychological composite scores

compositestandardizationvalidity

I am a doctoral student in clinical neuropsychology, and am in the process of creating z-unit weighted composite scores based on a battery of ability tests (i.e., attention/executive functions composite = Z scores for Trails A&B, Stroop, Digit span, RBANS coding, Symbol search). The scores will be used in a regression model involving a sample of 48 individuals with dementia and 32 normal healthy controls.

I am creating these scores using the recommendations on Jeromy Anglim's blog, but am looking for some additional solid references on the pros and cons of using composite scores. I would prefer to use factor analysis, however because I am working with a clinical sample (Alzheimer's disease) my sample size is too small.

  • Can anyone recommend readings on the use and pros and cons of z weighted composite scores?

Best Answer

1. Which component tests to combine

  • You need to determine whether it is meaningful to combine your battery of ability tests to form an overall composite. This is separate to the issue of the weightings you use for the component variables. This links in to both general literature on validity and scale construction, as well as more domain-specific literature on neuropsychological ability measurement. You probably want to combine empirical evidence (e.g., pattern of correlations of component ability tests both in your sample, and any other available samples) with theoretical evidence (e.g., theories of what construct you are trying to measure and the mapping between the component measures and this construct). The smaller is your sample size, the more you should rely on previous research and theory.

2. How to combine chosen component tests

  • Converting component ability tests into z-scores before summing using a unit weighting is popular for several reasons.
    • First, ability tests are often on very different scales (milliseconds to complete, number of problems solved, percentage correct, number of errors, etc.), and have very different standard deviations. Thus on the assumption that each test should be included in the composite, a reasonable starting point is to say that each test should contribute equally to the total. Various definitions of equal contribution are possible, but conversion to a z-score and then unit weighting is one simple option.
    • Second, I find there is a bit of a blurring between reflective and formative measurement models in the measurement of composite abilities. Reflective measures propose that tests reflect the underlying construct. Formative measures propose that the construct is represented by the combination of the individual measures. Reflective measures treat the common variance between tests as representing the construct of interest. Formative measures do not rely on intercorrelations between component tests to justify combining component variables; rather theoretical arguments are made, or arguments about how component variables predict a common outcome. Factor analysis is a way of getting optimal weights for reflective measures; but this does not apply to formative measures. When measuring a broad based ability measure, there is an expectation that component tests will inter-correlate, but there is also a desire to measure the full breadth of the construct. Thus, there can be theoretical arguments to adopt a unit-weighted approach despite factor analysis suggesting an alternative set of weights for a first factor.
    • Third, psychological scales are often understood normatively, and thus, combining z-scores, which scale variables in terms of variation between individuals, can seem natural.
    • Fourth, there is a desire for simplicity and replicability. Once the means and standard deviations are fixed, the transformation is easy to apply across studies. Or if across study comparison is not desired, it is easy to apply using the sample mean and standard deviation for component measures.

In the above, I'm not trying to say that unit-weighted sums of z-scores are the best way to form an ability composite. If anything, they are relatively simple approaches to the task of measuring and scaling multi-faceted ability constructs. I've merely tried to highlight why a researcher might adopt the approach.

A reference showing an example study

The following reference provides a case study of researchers discussing their rationale for combining component tests based on z-scores, and some of the issues that it entailed:

  • Gary R. Cutter, et al (1999). Development of a multiple sclerosis functional composite as a clinical trial outcome measure. Brain 122(5): 871-882 doi:10.1093/brain/122.5.871