Solved – w-score vs. z-score with covariates

modelingregression-strategiesz-score

I want to predict GM volume in a group of patients based on their degree of cognitive impairment, corrected for age and sex. To have a more ‘disease specific’ measure of cognition, I use cognitive performance-scores in a large group (N=500) of healthy controls (HC) as a reference.

Me and my supervisor discussed two methods for doing this (the w-score vs. the z-score method):

1. w-score method:

a. calculate the effect of age and sex on cognitive score in the HC group (cognition = a + (b * age) + (c * sex))

b. predict cognitive score in the patient group based on the regression coefficients we found in the HC group

c. for each patient, subtract this predicted score from his actual score, and divide by the SD of the HC’s residuals
(w-score = (cognition.obs – cognition.pred)/SDres)

d. perform a regression in which w-score predicts GM volume
(GM volume = a + (b * w-score))

2. z-score method:

a. calculate the mean and SD of cognitive score in the HC group

b. for each patient, subtract the HC’s mean from his actual cognitive score, and divide by the HC’s SD (z-score = (cognition.obs – cognition.mean)/SD)

c. perform a regression in which z-score predicts GM volume, using age and sex as covariates (GM volume = a + (b * z-score) + (c * age) + (d * sex))

My supervisor wants to use the w-score method (because it is similar to the use of ‘norm tables’ which are based on a HC group and have corrections for age/sex). I actually prefer the z-score method, because the effect of age/sex on cognition in my patient group is different from the age/sex effect in the HC group.

If the logic behind correcting for age and sex is that they are a covariate/confounder in my regression (i.e. they directly relate to GM volume and might not be evenly distributed over cognitive scores), wouldn’t it make more sense to use the z-score method? In that way, you correct for the actual effect of age/sex that exists in the patient group (instead of a different effect that only exists in the HC group).

I’m very curious about your opinions, thank you in advance.

Anita

Best Answer

IMHO this is not based on statistical principles, and such manipulations cause observations to be correlated even if they started out independent. You are also making the strong assumption that the standard deviation is an appropriate normalizing statistic and that you have estimated the SDs very tightly. SD is useful for smooth symmetric distributions with non-heavy tails. This may not apply to your data.

The best approach to statistical modeling is to spend a lot of time formulating a comprehensive model that takes into account all known sources of variability that you can measure. This model uses the raw data and leads to comparisons of real interest.

Related Question