Solved – What to do with categorical data when calculating standardized z-scores

categorical dataordinal-datastandardization

I have numerous environmental variables I'd like to correlate to some tree species data. The environmental variables vary greatly in scale, so I'd like to standardize each by calculating standard z-scores (mean=0, SD=1) for each variable. However, The environmental data consist of a mix of continuous, integer, ordinal, and nominal variables. I'm not sure how to go about standardizing for categorical data.

My main two questions:

  1. Are ordinal data treated the same as continuous data when calculating standardized z-scores?

  2. How do I approach nominal variables when calculating standardized z-scores?

Best Answer

Are ordinal data treated the same as continuous data when calculating standardized z-scores?

No, they are not: When dealing with data on different measurement scales it is important that your analysis should not use mathematical operations that are not meaningful within that measurement scale. For ordinal data, only the ranking of the values in the scale is meaningful, and so you should only use operations that are invariant to all changes in the numbering of values that preserve rank-order. This counts out any operation that uses the arithmetic operations $+$, $-$, $\times$ and $\div$.

For ordinal data, the sample mean and sample standard deviation are not invariant to all changes in the numbering of values that preserve rank-order. This means that the sample mean and sample standard deviation are meaningless for ordinal data. Consequently, the z-score is also meaningless.

(Note: In some cases researchers treat apparently ordinal data as if it were interval or ratio data, which amounts to asserting that the differences/ratios in the ordered categories are meaningful. In this case there is often some argument over whether it is justifiable to treat data on a higher measurement level.)

How do I approach nominal variables when calculating standardized z-scores?

Nominal and ordinal variables do not allow use of the arithmetic operations $+$, $-$, $\times$ and $\div$, so the z-score for these variables is meaningless. For a nominal variable the only meaningful measures are those that count frequencies/relative frequencies of the categories and use the operations $=$ and $\neq$. For ordinal variables you also have meaningful measures for cumulative frequencies/relative frequencies using the operations $<$ and $>$ (taken in the order for the ordinal variable).

Related Question