The massive 58 amid much lower frequencies signals that any test is just quantifying a major failure of independence. I did this in Stata. The command ret li
(short for return list
) obliges Stata to show results as exactly as it knows them, but both tests yield P-values that are 0.000 to 3 d.p. It is right to be a little cautious about low expected values (for row 1 here in particular) but the test results are overwhelming.
. tabi 0 2 \ 5 58 \ 4 3 \ 4 3
| col
row | 1 2 | Total
-----------+----------------------+----------
1 | 0 2 | 2
2 | 5 58 | 63
3 | 4 3 | 7
4 | 4 3 | 7
-----------+----------------------+----------
Total | 13 66 | 79
Pearson chi2(3) = 20.5779 Pr = 0.000
. ret li
scalars:
r(p) = .0001288081813192
r(chi2) = 20.57794057794058
r(c) = 2
r(r) = 4
r(N) = 79
. tabi 0 2 \ 5 58 \ 4 3 \ 4 3 , exact
Enumerating sample-space combinations:
stage 4: enumerations = 1
stage 3: enumerations = 3
stage 2: enumerations = 17
stage 1: enumerations = 0
| col
row | 1 2 | Total
-----------+----------------------+----------
1 | 0 2 | 2
2 | 5 58 | 63
3 | 4 3 | 7
4 | 4 3 | 7
-----------+----------------------+----------
Total | 13 66 | 79
Fisher's exact = 0.000
. ret li
scalars:
r(p_exact) = .0003124258226793
r(c) = 2
r(r) = 4
r(N) = 79
I don't see anything about your problem that is non-standard for counts of categories. The only thing that is even a little unusual is that you have extremely marked differences between languages.
For your data I get Pearson chi-square of $687.8$ with $15$ d.f. for a test of no association between the variables and the P-value is minutely small. For what it's worth, my program (Stata) reports the P-value as about $7 \times 10^{-137}$.
A good program should indeed flag small expected frequencies, which are the issue rather than small observed frequencies: I see a flag that 4 cells have less than 1 as expected frequency. So, there is a bit of a worry about the P-value, but it is really quite secondary. You could change the P-value by more than 100 orders of magnitude either way, but the message would be the same.
To put it directly, a simple test underlines what is evident just by looking at the frequencies, namely that the languages are very different, which you know any way. If you have some sceptic who doubts that, then a chi-square test provides back-up.
Doing this with Fisher's test is on one level more correct statistically, but it will not change the practical or scientific conclusion one iota.
You have quantitative data that are pertinent to a discussion, but you don't need statistical inference to add gloss. The numbers speak eloquently for themselves, and the details are the interesting part.
Naturally, I am responding to your example, and being firm about what it implies in no way rules out different conclusions for other data.
If there is a predictive model that predicts actual (relative) frequencies, then testing that is a much more interesting question, but you would need to tell us the details.
To respond a little more directly to your question: Fisher's exact test often is impractical once the frequencies stop being very small.
Best Answer
It takes more time to post the question than to try it out. Here is Stata:
You could report the P-value as 0.0004 or 0.00043, say. So, Fisher's test can be done for tables this size. A standard chi-square test (not shown here) gives a P-value of 0.00042, which every statistical person I know would regard as essentially identical. The tests support the interpretation that is evident from eyeballing the table of an association between row and column variables.