This is a great question!
I think that in scale construction, there's a delicate balance between interpretability and psychometric considerations. Specifically, a scale sum or average is much easier to grasp than a sum or average taken of standardized or otherwise re-scaled items.
However, there can be a somewhat subtle psychometric reason for re-scaling items prior to creating your scale composite (i.e., taking a sum or average). If your items have radically different standard deviations, the reliability of your composite scale will be decreased simply because of these differing standard deviations.
One way to understand this intuitively is to realize that, as you point out, items with widely varying standard deviations are assigned different weights in the composite. So, measurement error in the item with the greater standard deviation will tend to dominate the scale composite. In effect, having widely varying standard deviations reduces the very benefit that you're trying to accrue by averaging together multiple items (i.e., normally, averaging together multiple items reduces the impact of measurement error from any one of the component items).
I have created a demonstration of the effects of a single dominant item in some simulated data below. Here I create five correlated items and find the reliability (measured with Cronbach's alpha) of the resultant scale.
require(psych)
# Create data
set.seed(13105)
item1 <- round(rnorm(100, sd = 3), digits = 0)
item2 <- round(item1 + rnorm(100, sd = 1), digits = 0)
item3 <- round(item1 + rnorm(100, sd = 1), digits = 0)
item4 <- round(item1 + rnorm(100, sd = 1), digits = 0)
item5 <- round(item1 + rnorm(100, sd = 1), digits = 0)
d <- data.frame(item1, item2, item3, item4, item5)
# Cronbach's alpha
alpha(d)
Reliability analysis
Call: alpha(x = d)
raw_alpha std.alpha G6(smc) average_r mean sd
0.97 0.97 0.97 0.87 -0.14 2.5
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r
item1 0.96 0.96 0.94 0.84
item2 0.97 0.97 0.96 0.88
item3 0.97 0.97 0.96 0.89
item4 0.97 0.97 0.96 0.88
item5 0.96 0.97 0.96 0.87
Item statistics
n r r.cor r.drop mean sd
item1 100 0.98 0.99 0.97 -0.10 2.5
item2 100 0.94 0.92 0.90 -0.27 2.8
item3 100 0.93 0.91 0.89 -0.09 2.7
item4 100 0.94 0.92 0.91 -0.19 2.6
item5 100 0.94 0.93 0.91 -0.06 2.7
And here I change the standard deviation of item2
by multiplying the item by $5$. Note the dramatic drop in Cronbach's alpha due to this procedure. Also note that multiplying an item by a positive constant does not affect the correlation matrix constructed with these five items in the slightest. The only thing that I have done by multiplying item2
by $5$ is that I have changed the scale on which item2
is measured, and yet changing this scale greatly impacts the reliability of the composite.
# Re-scale item 2 to have a much larger standard deviation than the other items
d$item2 <- d$item2 * 5
# Cronbach's alpha
alpha(d)
Reliability analysis
Call: alpha(x = d)
raw_alpha std.alpha G6(smc) average_r mean sd
0.74 0.97 0.97 0.87 -0.36 4.7
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r
item1 0.68 0.96 0.94 0.84
item2 0.97 0.97 0.96 0.88
item3 0.69 0.97 0.96 0.89
item4 0.68 0.97 0.96 0.88
item5 0.68 0.97 0.96 0.87
Item statistics
n r r.cor r.drop mean sd
item1 100 0.98 0.99 0.96 -0.10 2.5
item2 100 0.94 0.92 0.90 -1.35 13.9
item3 100 0.93 0.91 0.86 -0.09 2.7
item4 100 0.94 0.92 0.89 -0.19 2.6
item5 100 0.94 0.93 0.90 -0.06 2.7
Best Answer
I'm not sure about the first part of your question. But regarding the second bit: a reliability analysis of itself does not tell you if you have one underlying construct or several. You can have a high cronbach-alpha (for reliability) in the presence of two or more factors. Definitely, do the factor analysis as well as reliability. You might also want to check out the latent variable and item response literature. Some of these models are set up to handle dichotomous and polytomous outcomes - which might deal with the z-score problem as well.