The R function chisq.test
allows a 'vector of probabilities' to be set with the p
argument. Is there an equivalent for fisher.test()
?
I'm trying to test whether there's a significant difference between the proportion of people in three unequally-sized groups who studied STEM subjects at university.
I get a Chi-squared approximation may be incorrect
warning, which I think is due to the small samples, so thought I'd try Fisher's.
As the three groups are unequal sizes, when using the chisq.test
I'm feeding in the probability distribution. I'm not sure how to do this with fisher.test
.
There's an R-Fiddle of the Chi-square implementation here.
My null hypothesis is that none of the three groups have a higher proportion of STEM (Science, Technology, Engineering, Maths) graduates than any other.
Group one constitutes 195 people (or 53% of the sample i.e. the 0.5313351499 in the R-Fiddle), Group two is 134 people (or 37%) and Group three is (38 people, or 10%). Group one has 22 STEM graduates, Group two has 16 and Group three has 9.
Best Answer
What you did isn't the way you run a chi-squared test. You need a contingency table. For each group, you have some people in STEM fields and some people who aren't. Thus, you will have two rows of counts, or two cells per group. Then you run a chi-squared test of the independence of the rows and columns. Here is a slightly edited version of your data:
Now you can run your test:
This yields the warning that you saw. As a rule of thumb, it is generally recommended that the expected count for each cell under the null hypothesis to be at least 5. However, it has been shown that this is overly conservative, and the chi-squared test is robust even if that isn't exactly the case. We can examine your expected counts like so:
Your minimum expected count is
4.866485
, and all the others are >5. Realistically, this is nothing to bother over. However, if you are concerned about it, you can just simulate the p-value instead of using the chi-squared approximation. Here is the chi-squared test using that option:As you can see, the p-value is essentially the same.