I need to do a cell chi square test, but googling availeth me not (I suspect because of all the pages about the cells in chi square discussions generally). How does one calculate this? Any good sources on it, or on it in R specifically?
@whuber is right that only the expected counts matter. Reading some of the threads returned by his search may help you; I discuss the issue here: For chi-square on any 2 by X contingency table, should no more than 20% of the cells be less than 5? You can also see the point made under "Expected cell count" on the Wikipedia page.
Note that the expected cell count is the probability times the N. Your effects are so small that huge N will be required. Here are screen shots of G*Power using your ratio of n's, or equal n's:
Your lowest probability is $3.3\%$, and your lowest count is $6187$, meaning that the expected count would be $204\gg 5$. Thus, @rvl is right: you won't have to worry about that assumption. (You will have to worry about needing so much data to have reasonable power, I suspect.)
a) These tests test essentially different things, so you shouldn't be surprised that they give different results. If you look at the data, defect B has a particular pattern, namely a minimum in shift 3 where the others have their maximum (by some distance). This may be the main source for the significant results in the first two tests. In the third test B is aggregated with C and D, and that special shape is no longer visible. Test 1 tests the null hypothesis that "all four defects have the same behaviour over shifts", and B clearly has not. Test 3 doesn't test anything anymore about (isolated) B, B is lumped together with C and D, and together they don't look that different over the shifts from A, so the test doesn't find anything significant anymore. Such effects happen generally when aggregating categories, although it may not make much of a difference as long as the aggregated categories are statistically similar.
b) Testing too much isn't always advisable. For example, if you're running one test at 5% level, in case the null hypothesis is true, you have a 5% error probability to find significance. If you're running three tests, the probability to find significance at least in one test in case all null hypotheses are true is obviously larger than 5%, as every single test has a 5% (type I) error probability already on its own. It can be up to 15%, but this depends on how the different tests depend on each other. I would normally only run one test, and in case of significance look at the numbers to see what is the most striking thing to cause dependence between rows and columns. (B's minimum at shift 3 is the most striking thing for me here.) Admittedly I sometimes run more tests for curiosity as well, but I take the results with a grain of salt then. Certainly this is not how big scientific discoveries are made!
c) The 5% cutoff is pretty artificial and looking at p-values can't hurt. Your results seem fairly unambiguous, but for example if the 5% significance threshold is 5.99 and you observe 5.92 or something, despite not being significant one can take this as some (albeit weak) indication that something may be going on, about as weak or strong as 6.01, say. I'd only pay strong attention to such a cutoff if a black or white decision is really needed.