Solved – Can Fisher’s exact test accept a ‘vector of probabilities’

chi-squared-testfishers-exact-testr

The R function chisq.test allows a 'vector of probabilities' to be set with the p argument. Is there an equivalent for fisher.test()?

I'm trying to test whether there's a significant difference between the proportion of people in three unequally-sized groups who studied STEM subjects at university.

I get a Chi-squared approximation may be incorrect warning, which I think is due to the small samples, so thought I'd try Fisher's.

As the three groups are unequal sizes, when using the chisq.test I'm feeding in the probability distribution. I'm not sure how to do this with fisher.test.

There's an R-Fiddle of the Chi-square implementation here.

My null hypothesis is that none of the three groups have a higher proportion of STEM (Science, Technology, Engineering, Maths) graduates than any other.

Group one constitutes 195 people (or 53% of the sample i.e. the 0.5313351499 in the R-Fiddle), Group two is 134 people (or 37%) and Group three is (38 people, or 10%). Group one has 22 STEM graduates, Group two has 16 and Group three has 9.

Best Answer

What you did isn't the way you run a chi-squared test. You need a contingency table. For each group, you have some people in STEM fields and some people who aren't. Thus, you will have two rows of counts, or two cells per group. Then you run a chi-squared test of the independence of the rows and columns. Here is a slightly edited version of your data:

totals                      <- c(195, 134, 38)
stems                       <- c(22,16,9)
group_stem_counts           <- matrix(c(stems, totals-stems),ncol=3,byrow=TRUE)
rownames(group_stem_counts) <- c("stem", "non-stem")
colnames(group_stem_counts) <- c("Group One","Group Two","Group Three")
group_stem_counts
#          Group One Group Two Group Three
# stem            22        16           9
# non-stem       173       118          29

Now you can run your test:

chisq.test(group_stem_counts)
# 
#         Pearson's Chi-squared test
# 
# data:  group_stem_counts
# X-squared = 4.5225, df = 2, p-value = 0.1042
# 
# Warning message:
# In chisq.test(group_stem_counts) :
#   Chi-squared approximation may be incorrect

This yields the warning that you saw. As a rule of thumb, it is generally recommended that the expected count for each cell under the null hypothesis to be at least 5. However, it has been shown that this is overly conservative, and the chi-squared test is robust even if that isn't exactly the case. We can examine your expected counts like so:

chisq.test(group_stem_counts)$expected
#          Group One Group Two Group Three
# stem      24.97275  17.16076    4.866485
# non-stem 170.02725 116.83924   33.133515
# Warning message:
# In chisq.test(group_stem_counts) :
#   Chi-squared approximation may be incorrect

Your minimum expected count is 4.866485, and all the others are >5. Realistically, this is nothing to bother over. However, if you are concerned about it, you can just simulate the p-value instead of using the chi-squared approximation. Here is the chi-squared test using that option:

chisq.test(group_stem_counts, simulate.p.value=TRUE)
# 
#         Pearson's Chi-squared test with simulated p-value (based on 2000
#         replicates)
# 
# data:  group_stem_counts
# X-squared = 4.5225, df = NA, p-value = 0.1184

As you can see, the p-value is essentially the same.

Best Answer

Related Solutions

Statistical Validity – How to Determine the Statistical Validity of Results

Solved – Effect size for Fisher’s exact test

Related Question