[Math] Finding correlation in large data, non-numeric sets

st.statistics

Suppose I collect a lot of data from a group of persons, like

  • their height
  • their weight
  • color of eyes (chosen from eg the four categories blue/brown/black/other)
  • sex
  • day of the week the measurement was done

and I want to find which of the above data correlates with height.

For weight, I can image how that goes: just make the (x,y) set and measure stuff like Pearson's correlation. But I get stuck for things like 'colors of eyes' where there's no natural way to order them, or binary data like sex. Also for day of the week (where I'd like to find no correlation at all) but I wouldn't know how to order them.

How do I do this?

Best Answer

There are lots of textbooks under headings like "Categorical Data Analysis", "Non-Parametric Statistics", et cetera, and all the standard statistical packages — GenStat, SAS, SPSS, and other names that will make me sound real ancient if I mention them — will have lots of practical advice about their usage on this kind of data.

Related Question