The short answer is that separate chi-square tests or Fisher tests looking at specific questions are not the same as an over-arching GLM in effect asking several questions at once. Whimsically put, each separate test can't know about the other kinds of variations in the data.
I guess most statistical people would, on the information you supply, encourage you to work with an overall model with infected/not infected as a response and species, sex, interaction terms, and whatever else as predictors.
Small frequencies in some cells won't make matters easy, but there will be less adhockery and less of a mess of lots of little tests.
You seem to be following the idea that there is a single correct analysis for your data that a statistically competent person should be able to tell you, but the best framework for you should also be chosen in the light of your scientific judgement about what is going on. For example, it's a matter of biological judgement about whether it makes sense to put different species in the same model.
Expected frequencies more than 5 is a very conservative rule for chi-square tests. In any case the sensitivity of chi-square tests to small frequencies can be explored computationally rather than being treated as a matter of dogma. However, as a GLM is likely to be the better framework here, that is secondary.
The relationship between the Wald test and the Pearson $\chi^2$ is a particular example of the relationship between Wald tests and score tests.
The Wald test statistic for the difference between a value of a parameter $\hat \theta$ estimated from a data sample and a null-hypothesis value $\theta_0$ is:
$$W = \frac{ ( \widehat{ \theta}-\theta_0 )^2 }{\operatorname{var}(\hat \theta )}$$
The Pearson $\chi^2$ test statistic for differences in a contingency table between a set of observed counts ($O_i$) and those expected based on a null hypotheses like independence, $E_i$, is:
$$\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$$
where the sum is over all cells $i$ of the contingency table. In both Wald and Pearson tests, the numerator terms represent squared differences between values found from the data sample and those expected under the null hypothesis.
The relationship you intuit between the Pearson statistic and the Wald statistic becomes clear if you think about a contingency table as representing a sample from a set of Poisson-distributed count variables, one for each cell. (That's the basis of Poisson regression, also called log-linear modeling, for contingency-table analysis.) For a Poisson distribution the variance equals the mean, so the denominator terms can be considered the estimated variances of counts in each cell of the table under the null hypothesis.
In contrast, the denominator in the single-parameter Wald statistic is the variance around the estimated value of the parameter. For Wald tests of more complicated hypotheses the denominator is related to the covariance matrix of the parameter estimates, still evaluated around the estimated parameter values.
So in both tests the denominator involves a variance estimate, one evaluated at the null hypothesis and the other around the estimated parameter values. This is as explained by @gung in this answer about likelihood-ratio, Wald, and score tests, as the Pearson test is a particular example of a score test.*
Wald tests are evaluated at the parameter values estimated by maximum likelihood while score tests are evaluated at the null hypothesis.
In practice, statistical software will report all 3 tests for a full multiple-regression model fit by maximum likelihood but usually only Wald tests for individual coefficients. That's mostly because the parameter covariance estimates for the Wald test can be obtained directly from the numerical approximations made in fitting the model, while there has to be re-evaluation of the model with respect to each parameter to get p-values and confidence intervals based on the other tests. That doesn't mean that Wald tests are "better" in any sense other than convenience. With small samples Wald tests are often considered the least reliable.
*This paper by Gordon Smyth provides a proof of how a more general Pearson goodness-of-fit test is equivalent to a score test. Score tests are sometimes called Lagrange-multiplier tests.
Best Answer
The Pearson $\chi^2$ test, the Wald test, (the likelihood ratio test, Rao's score test, ...) are all approximate. If you have an infinite sample, then they will be exactly the same, but in smaller samples you will find differences.