Interpret categorical correlation coefficients

classificationcorrelationmachine learningneural networksstatistical significance

As my dataset contains lot of nominal categorical variables (as input) and binary variable as output, I am using CRAMER's V test to determine the correlation between different input features and against the output variable.

However, unlike perarson correlation coefficent which is easy to interpret or put in words the association, am not able to understand how to interpret the correlation coefficient

example

Pearson correlation coefficient (Age, salary) = 0.95

Age and salary are highly positiively correlated. Meaning, As age increases, salary also increases (vice versa).

However, for categorical variable like below

Cramer's V coefficient (Industry Segment, Target met) = 0.90

How to interpret – ?

Cramer's V coefficient (Regional movie industry, National movie industry)= 0.90

How to interpret – ?

Best Answer

There's two aspects here: the mathematical and the linguistic.

OP and I spoke in a separate chat about what he was curious about, and it was the conceptual/linguistic aspect (not mathematical). For future reader's, I'm leaving a link which succinctly explains the mathematical basis/calculation/basis for Cramer's V under "Calculation". Read here about Cramer's V.

Linguistically, "correlation" isn't exactly the right term to describe the relationship between two nominal variables, since they don't "vary" together, because they don't increase/decrease. They just change categories/groups, so you're really applying statistics on how often certain categories "coincide".

Therefore, two nominal variables are said to have an "association" (as just a general term), rather than "correlation". For Cramer's V, the closer you are to 1, the stronger the association between the variables - the closer to 0, the weaker the association between the variables.

Related Question