For context: I should note in advance I am a relative beginner with this.
Data context
I have data on some 600 000 persons which includes a column of whether these persons took parental leave or not (coded simply as 1 – took parental leave, 0, took no parental leave). I also have a column coding each person as male or female. I want to know whether persons coded as female are more likely to take parental leave than persons coded as male.
So I made a 2×2 table (female/male; no parental leave/parental leave) and applied the chi-square test which is significant (as expected). The residuals + prop table show that indeed women are overrepresented in taking 'parental leave'. So far so good.
Problem statement
However, the effect size is relatively small (Cramer's V about 0,15). For a number of reasons this seems counterintituive – the difference between men and women in the 'parental leave = 1' group seems quite large. I googled/read a bit about effect size & unbalanced groups. In this case there is a large dataset, with a relatively small proportion of the 600 000 persons taking parental leave. Could this affect the effect size, if yes, is there any measure other than Cramer's V that should be used in this regard?
Note: I am not specifically looking for a large effect size, just wondering whether I am applying the right measure.
Own research I have read the post: Chi-square Test with High Sample Size and Unbalanced Data but it didn't quite answer my question (the issue seems similar though).
Best Answer
Answer from comment thread:
It sounds like you are describing the most useful effect size for your situation:
If I understand what you are saying, this is the odds ratio.
For future readers, as an example of a 2 x 2 table with Cramer V = 0.15, and OR = 2, the following is code in R: