If you use a chi-square you can partition it into 2x2 contrasts - possibly with the aid of collapsing some groups for some of the contrasts. (Your lowest expected values are 3.81 and 4.50, which for most purposes is plenty.)
Alternatively, you could use a chi-square and perform the various 2x2 post hoc comparisons. If you want them, you can even do the usual kinds of significance level adjustments for those.
If you don't have to have formal tests, you can just look at the Pearson residuals or (the signed square root of) the contribution to chi-square from the independence model to see which cells contributed substantively to significance.
Similar things can be done with the G-test (the likelihood-ratio-test form of chi-square).
You might instead fit a multinomial model via GLMs and test whichever contrasts or post hoc comparisons matter for you.
I don't see anything about your problem that is non-standard for counts of categories. The only thing that is even a little unusual is that you have extremely marked differences between languages.
For your data I get Pearson chi-square of $687.8$ with $15$ d.f. for a test of no association between the variables and the P-value is minutely small. For what it's worth, my program (Stata) reports the P-value as about $7 \times 10^{-137}$.
A good program should indeed flag small expected frequencies, which are the issue rather than small observed frequencies: I see a flag that 4 cells have less than 1 as expected frequency. So, there is a bit of a worry about the P-value, but it is really quite secondary. You could change the P-value by more than 100 orders of magnitude either way, but the message would be the same.
To put it directly, a simple test underlines what is evident just by looking at the frequencies, namely that the languages are very different, which you know any way. If you have some sceptic who doubts that, then a chi-square test provides back-up.
Doing this with Fisher's test is on one level more correct statistically, but it will not change the practical or scientific conclusion one iota.
You have quantitative data that are pertinent to a discussion, but you don't need statistical inference to add gloss. The numbers speak eloquently for themselves, and the details are the interesting part.
Naturally, I am responding to your example, and being firm about what it implies in no way rules out different conclusions for other data.
If there is a predictive model that predicts actual (relative) frequencies, then testing that is a much more interesting question, but you would need to tell us the details.
To respond a little more directly to your question: Fisher's exact test often is impractical once the frequencies stop being very small.
Best Answer
Several procedures in R give much the same result.
Suppose Yes's and No's in four categories are as follows.
Test to see if proportions of Yes's are the same in all four categories:
The warning message is given because of the small counts in Category 4. (Essentially, this test uses a normal approximation, expressed in terms of a chi-squared statistic with 3 DF.)
The
prop.test
procedure in R is essentially the same as a chi-squared test of homogeneity onTBL
, without the Yates continuity correction.In this version of the test, it is easier to show explicitly why the P-value may be incorrect. The warning message is given whenever any one of the expected counts in the chi-squared test is smaller than $5.$ Here is how to display the table of expected counts. Notice that the P-value is exactly the same as above.
However, as implemented in R, it is possible to simulate a more accurate P-value (using the parameter
sim=T
). Notice the (slight) change in the simulated P-value.Traditionally, Fisher's exact test (based on a hypergeometric distribution according to marginal counts) was limited to $2 \times 2$ tables with relatively small counts. However, its implementation in R can use larger tables (within limits of available computer memory to do the computations). The table
TBL
is as suggested in one of @Dave's Comments.Note: For the fictitious data used here, all tests are significant at the 7% level, but not at the 5% level. Often simulated P-value of the chi-squared test of homogeneity and the P-value of Fisher's exact test are more different from one another than for these data.