Some immediate responses:
1) Your lecturer means that the data show autocorrelation. This leads to inefficient estimates of regression coefficients in simple linear regression. Depending on whether it was covered in your course, that's a mistake.
2) Maybe I do not understand the problem fully, but IMAO the chi-squared test of independence is used correctly here, except for two other issues:
3) Your chi-square test has an immense power, because of the sample size. It's hard not be significant even if effects were very small. Furthermore, it appears you have a census of the population. In this situation statistical inference is unnecessary, because you obseve all population units. But that's not what the lecturer remarks.
4) You seem to aggregate the data across time points. You should actually test once per time point, since otherwise you aggregate effects over time (you count units multiple times). But that's also not what the lecturer remarks.
The lecturer actually remarks that you want to test the null of homogeneity, where you tests the null of independence. So what does he mean by homogeneity?
I suppose he refers to the test of marginal homogeneity in paired test data. This test is used to assess whether there was a change across time (repeated measures). This is however not what you want to assess in the first place. My guess is that he did not understand you want to test whether gender and employment at time point x are related. Maybe he also tried to suggest that what you should test is change across time (or no change, in which case the multiple repeated contingency would be called homogenous indeed).
Pearson's $\chi^2$ test is useful for a sample of $n$ observations cross-classified by two variables, say $A$ and $B$. These tests test the null hypothesis that $A$ and $B$ are independent variables. So, for an example, if you crossed two strains of D. melanogaster (fruit flies) with different mutations and observed the $F_2$ generation frequencies in $n$ progeny, the $\chi^2$ test tests for linkage of the two traits (i.e., are they on different chromosones [null] or the same chromosomes [i.e., linked, the alternative]).
McNemar's test is used for paired data -- that is, each observation represents a pair of values. For an example, consider a set of $n$ lung cancer patients each with a spouse. You record the smoking habits of the patients and their spouse, and cross classify. Pearson's test would appear to have $2\,n$ observations, but in this case you only have $n$. McNemar's test makes this correction. The hypotheses tested are similar: "Is cancer status related to smoking status?"
I suppose that one could think of this as a "between subjects" vs "within subjects" difference, and there is no doubt that things are similar. I don't see them that way, but I'll confess to not having thought about it much.
In regards to your Question 2,the restriction is on expected cell counts, not observed cell counts. Observed counts are reality, while expected cell counts represent a model. You can think of the restrictions as helping to ensure a decent approximation under the null hypothesis. Reality can (and should) diverge from the model when necessary, but if the model is approximately correct, it would be bad to have a situation where discrepancies get inflated in small cells.
Finally, an exact test is precisely what it says it is. The distribution of the test statistic under the null hypothesis is known exactly. Pearson's $\chi^2$, McNemar's test, and the log-likelihood $\chi^2$ are all based on asymptotic approximations to the distribution of the test statistic under the null hypothesis. Fisher's test, by comparison, notes that conditionally on the marginal totals, the distributions in the two cells of any row (or column) of the table follow a hypergeometric distribution. This insight permits computation of an exact observed significance level ($p$-value) for any given number of observations in the $1, 1$ cell.
Fisher's exact test tests the same null as Pearson's $\chi^2$ and can be used whenever Pearson's is appropriate and in other situations where Pearson's approximation is believed to be unreliable.. Pearson's test also makes use of the information in the marginal totals, and so is also conditional on those totals. Knowing the a priori margins (or even one margin) is unnecessary.
Best Answer
One possible reference might be to section 2.4 of Alan Agresti's "An Introduction to Categorical Data Analysis"[1]. It might be worth checking if that has enough of what you need.
[1]: Agresti, A. (2007),
An Introduction to Categorical Data Analysis,
John Wiley & Sons Hoboken, NJ