Solved – Chi-square & fisher’s exact test output interpretation

chi-squared-testfishers-exact-testinterpretation

I want to investigate the correlation between two nominal variables. I have executed chi-square and fisher exact test using SPSS.

enter image description here

I have also executed Cramer’s V (V=0.444, p=0.000) and phi coefficient (value=0.768, p=0.000) .
My interpretation of the results is:

  • The two variables are dependent
  • The relationship is strong

Am I right?. Thanks in advance.

Best Answer

Yes, you're correct on both accounts. These statistics strongly support the conclusion that the effect is both statistically significant the effect size is large compared to typical effect sizes for typical variables included as explanatory or control variables in a typical scientific or business analysis context.

We can make this concrete by looking at an example of a contingency table with similar summary statistics:

$$ \begin{bmatrix} 10 & 3 & 2 & 1 \\ 2 & 10 & 2 & 3 \\ 1 & 2 & 11 & 3 \\ 0 & 3 & 2 & 12 \end{bmatrix} $$

Here, the accuracy is 64%. That is to say, out of the four possible class labels, each of which is approximately likely, the class label is in fact exactly the same 64% of the time, compared to the ~25% predicted by under the independent hypothesis. So it should be intuitively clear that is an example of two variables that exhibits strong dependence.

Yet the summary statistics for this example are similar to what you report: $\chi^2 = 58.279$, $df=9$, $p = 2.87\mathrm{e}{-09}$ for the chi-squared test with continuity correction. For effect sizes, $\phi = 0.746$, $v = 0.53$. (Because you say the variables are nominal, Cramer's v is slightly preferred to Pearson's $\phi$.) All of this is in the same ballpark as what you report, and suggests that your data exhibits a similarly obvious and strong relationship.

When we look for guidance on interpreting effect sizes, we'll see different opinions, but in general Cramer's V over 0.5 is considered extremely strong. However, such broad guidelines are not very useful without context. For example, one place where I use Cramer's V scores in my work is to perform data quality checks between databases. Cramer's V is used to compare database columns containing categorical data, and I expect to see v-scores over 0.9, otherwise the data quality is suspect. So in that context a 0.53 would be considered much too low, not "extremely strong." The point being that it depends very much on context. But in the context of analyzing a curated data set where variables have different semantic meanings, (say Highest Education Achieved and Income Level,) then v=0.53 would indicate an extremely strong relationship.

As you have opted to not provide details on the context I assume that it's sensitive and/or proprietary, but I hope these guidelines and examples will give you a point of reference while you are drawing your own conclusions.