Solved – How to weigh several variables at two levels to create an overall composite variable score

compositecorrelationlatent-variable

Context

I am trying to find the correlation between two latent variables. Let's call them A and B.

  • A has two dimensions. Let's call them Dimension 1 and 2.
  • B also has two dimensions. Let's call them Dimension 3 and 4.
  • The four dimensions (1, 2, 3 and 4) have several indicators each. Let's call them (a) and (b) for Dimension 1, (c), (d) and (e) for Dimension 2, (f) and (g) for Dimension 3 and (h), (i) and (j) for Dimension 4.
  • I have measured the ten indicators (a, b, c, d, e, f, g, h, i and j) on a likert scale or else using cardinal numbers (e.g. one, two, three, four, etc.).
  • I have then weighted the indicators using the likert scale codes that I created (e.g. 0 = not important, 1 = low importance, 2 = medium importance, 3 = high importance and 4 = absolute importance)
  • I have then added the scores for all indicators to arrive at the dimension scores. This is my first weighting.
  • I have then said that each dimension contributes 50% to the composite score of A or B and have standardised the dimension scores to 50% each. This is my second weighting.
  • Finally I added the dimension scores to arrive at the composite score.

To give an example:

If I have a score of 2 for (a) and 3 for (b) then my total for Dimension 1 is 5. Similarly, if I have a score of 3 for (c), 4 for (d) and 2 for (e), then my total for Dimension 2 is 9. The score of 5 and 9 are the dimension scores, based on likert scale weights.

As I said that each dimension contributes 50%, I have divided the obtained score by the total score for each dimension and multiplied it by 50%. So I have got the following:

(2+3)/8 (remember this is based on two likert scale responses so the maximum score is 4 plus 4) x 50% = 0.31

(3+4+2)/12 (remember this is based on three likert scale responses so the maximum score is 4 plus 4 plus 4) x 50% = 0.38

I then added 0.31 and 0.38 and got 0.69. This is my composite score for Composite Variable A (which has 2 dimensions and 5 indicators; 2 for the first dimension and 3 for the second dimension).

To make it comprehensible, I have multipled 0.69 by 100 to get 69 out of 100 scores. I want to run corrleation on the composite scores of A and B to see the relationship between them.

As you would see, I have not done any complex statistical modeling using factor analysis etc but arrived at my final score using a system of subjective weighting based on relative importance (which is aligned to the likert scale coding in the first instance) and equal weighting in the second instance. I have used this approach to keep everything simple. I know it is subjecive weighting but I can justify it to some extent with theory and the rest with commonsense!

Questions

  • Am I doing this right? Is this approach robust enough?
    Remember, I don't want to complicate things! I am not statistically minded!

  • Can I do correlation analysis on the composite scores of A and B? They are rank scores (based on weights) so is the Spearman Rank Correlation the way to go for the analysis?

Best Answer

The composite variable

  • Your algorithm for calculating scale scores seems fairly standard. Do you have a question about it?
  • In most cases whether you take the mean of all items in the scale or whether you create subscale means and then take the mean of the subscales, both options are likely to give similar answers where the number of items per subscale is similar.

Estimating correlations between latent variables

Structural equation modelling adjust estimates of correlations between latent variables for reliability of measurement estimated from the intercorrelation between the items and the model.

Manually calculating scale scores (as you appear to be doing) and correlating these scale scores does not involve such an adjustment. As such you are correlating observed variables, you are not estimating the correlation between latent variables. Many researchers, probably the majority, report correlations between observed variables. So, if you adopt this statistically simpler approach, you would be in good company. However, if you are particularly interested in estimating correlations between latent variables, then I would encourage you to explore structural equation modelling approaches.

An alternative approach is just to adjust the correlation based on some estimate of reliability of the two variables (see this discussion).

Correlations on scales based on likert scales

See this answer by @chl on whether to treat likert scales as interval or ordinal. My opinion is that once you are summing over a reasonable number of items, treating the data as interval is typically useful.