Solved – Constructing an index: is it necessary to standardize? Which is the best solution

I'm constructing an index made by the sum of three variables, which all vary in the range [0,1] (as they have been divided by their theoretical maximum).

Depending on the sample analyzed, it is not always assured that the three variables have the same variances (and st. deviations). In order to attribute them the same importance, one solution, before making the sum, would be to standardize them (subtracting the mean and dividing by the standard deviation). In this case the index would correspond to the sum of standardized scores.

My questions are:
– is standardization necessarily required? does it have other drawbacks (apart from changing the scale)? I cannot always assume that the variables follow a specific distribution.
– Would you reccommend standardization or is there another more appropriate solution?
– Could the sum of unstandardized scores be OK?

(cross posted from here https://www.statalist.org/forums/forum/general-stata-discussion/general/1423416-constructing-an-index-is-it-necessary-to-standardize).

EDIT:
I'm adding a data example after Placidia's answer, to provide a better description of my case and make the question more generalizable.
Please consider an example for the three variables I want to sum in a final index and their transformations as in the following table. I have ten observations to rank.

The final index can be obtained either by: (1) the sum of raw values divided by the theoretical maximum; (2) or the sum of the rescaled values in [0,1]; (3) or the sum of the standardized values (subtract mean and divide by SD); (4) or the sum of percentile scores. Please see the table and the graph:

Different choices could produce different rankings for the 10 observations.
Which solution would be more appropriate? Is there a totally wrong one?

Best Answer

Since you have divided the variables by the theoretical maximum and the 3 variables now lie in [0,1], you have effectively standardized the varibles. As @ttnphns points out, there are many ways of standardizing, and you have already chosen one. The 3 values are now the same order of magnitude.

Statistical standardization is not always required or recommended. Suppose the index is a consumer price index, and you sample what a group of people spend on rent and what they spend on chocolate. In building the index, you don't want to standardize the variables because in the real world, rent really is more important than chocolate and takes up a larger part of someone's budget. The common element is the time frame of the expenses (1 month, say) and we are interested in the actual dollars spent.

I never like to be in a situation where I have to standardize variables, because it implies that I am grouping together variables that measure different things on different scales. This raises the question of why I am even looking at those quantities together. Sometimes it makes sense to do so, but it often doesn't. One danger of glib standardization is that it may lure people into a false sense of confidence. They believe that a statistical dodge allows them to throw a whole bunch of variables into the hopper without asking themselves whether their project makes sense.

Best Answer

Related Solutions

Solved – Use a combination of grand mean and group mean centering to standardize variables

Solved – How to find an appropriate standardization method for combining non-normally distributed variables

Related Question