Solved – n alternative to Cramer’s V for computing effect size when chi-squared is inappropriate

categorical datachi-squared-testcramers-veffect-size

I have a dataset within which I have a particular response variable that I'm interested in and numerous predictor variables. All variables are nominal and have as many as 15 possible values. When I cross tabulate any given predictor variable with the response variable, I get many cells with 0 counts, making performing a chi-squared test of independence inappropriate. That's fine, because I can use Fisher's exact test, but it's a problem in terms of calculating effect size since Cramer's V and every other method I've found that works for nominal data like mine seems to rely on chi-squared. Are there any alternatives to Cramer's V that don't have this problem? Or, if I'm misunderstanding something, is it still valid to user Cramer's V even if a chi-squared test is inappropriate?

Best Answer

The effect sizes I assume you are considering --- Cramer's V, (phi), Contingency coefficient C, and Cohen's w --- can all be calculated with the chi-square value. But the chi-square is simply calculated from the difference of observed values from expected values. This is way Cohen defines his w in Cohen (1988).

I assume that because there's no inference with these statistics, that it is fine to report them even if some test using the chi-square statistic would not be appropriate. It's like saying the difference between two means is some value, without addressing whether or not you could use a t-test or not in this case.

Related Solutions

Solved – Chi squared test result and Cramer’s V value

Because your sample size is large, the Chi-square test is likely to return a low p-value even for a table with small differences from the expected proportions.

To get a sense of the effect size being reported by Cramer's v, it is helpful to look at the proportions in the table. For example, for column 1, you can see that there is not much difference in the proportions for each grade within the rows. Row 0 is about one-half to 1 percent of the observations in each grade. Row 2 is, say, 93 to 97 percent of observations in each grade. And so on.

Whether these kind of differences in proportions are meaningful in your context is up to you. The p-value and Cramer's v give you certain information. The practical importance of your results is something you will have decide.

The following is code for R.

I am getting a slightly different Cramer's v than you, so that's something that you might want to look into.

Input =("
Col1               Grade1  Grade2   Grade3   Grade4   Grade5
0                   290      392      932     1812     2854
1                   522      421      574      917     1247
2                 56789    81296   117971   147811   204480
3                  3719     2975     2811     1704     2244
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix

chisq.test(Matrix)

   ### Pearson's Chi-squared test
   ###
   ### X-squared =  8113.9, df = 12, p-value < 2.2e-16

library(vcd)

assocstats(Matrix)

   ### Cramer's V        : 0.065 

prop.table(Matrix, margin=2)

   ###        Grade1      Grade2      Grade3      Grade4      Grade5
   ### 0 0.004729289 0.004607212 0.007621353 0.011901947 0.013537294
   ### 1 0.008512720 0.004948051 0.004693837 0.006023226 0.005914858
   ### 2 0.926108937 0.955479291 0.964698090 0.970882268 0.969903949
   ### 3 0.060649054 0.034965446 0.022986720 0.011192559 0.010643899

Solved – How to determine sample size for Chi-squared test

Complete re-write:

I think the correct approach to calculating Cohen's w is to use the expected values for the P0 values. I looked back at Cohen (1988), and this isn't precisely clear, but I think that's the intention.

So the problem is that your second case (dat_0_better) doesn't represent the expected values for dat, but those for dat_0 does.

chisq.test(dat)$expected

   ###      [,1] [,2]
   ### [1,]   20   20
   ### [2,]   30   30

So the calculation of w in the first case, I believe, is correct † .

library(rcompanion)
cohenW(dat)

   ### Cohen w 
   ### 0.4082

The table that you've constructed with dat includes the information that the control treatment results in 10 out of 50. This is taken into account with the expected values of the table, so I don't think you need to alter the null hypothesis to account for this.

I think what I'm saying makes sense in the standard sample size calculation. It's the case that those before us did the hard work.

† Caveat: I am the author of the rcompanion package. I don't know of another package in R that calculates Cohen's w, though I would suspect there are some.

Best Answer

Related Solutions

Solved – Chi squared test result and Cramer’s V value

Solved – How to determine sample size for Chi-squared test

Related Question