@John has a nice answer. I particularly like the discussion about fishing expeditions and how alpha-adjustment may not be necessary. I want to add one additional aspect to this discussion. With hypothesis testing, there are two different kinds of errors to worry about: type I and type II (also called alpha error and beta error). Both kinds are bad, and we want to avoid both of them. When people talk about alpha-adjustment, they are focusing only on the possibility of type I errors (that is, saying there is a difference when there isn't one). However, adjusting alpha to minimize type I errors necessarily decreases power. Thus, it necessarily increases the probability of type II errors (that is, saying there isn't a difference when in fact there is). In addition, it's worth noting that a-priori there is no reason to believe that type I errors are worse than type II errors (despite the fact that everyone seems to assume that this must be so). Rather, which is worse will vary from situation to situation and is a judgment that must be made by the researcher. In other words, deciding on a strategy for testing multiple comparisons (e.g., an alpha-adjustment strategy) one must consider the effect of the strategy on both type I and type II errors and balance these effects relative to: the severity of these errors, how much data you have, and the cost of gathering more.
On a different note, from your description it seems to me that your situation would best be analyzed by using a factorial ANOVA, with sex as factor 1, marital status as factor 2, language as factor 3, and age as factor 4. From the description (and I recognize that it is sparse) I don't see why a cell means approach (i.e., one-way ANOVA) is preferable. If you have no interest in interactions, the main effects from the factorial ANOVA are already orthogonal (at least if the $n$s are the same), and Bonferroni corrections are not relevant. Certainly it would still be possible to have more than 5% type I errors, but I'm a big believer in @John's fourth paragraph; when I'm testing theoretically suggested, a-priori, orthogonal contrasts, I don't use alpha-adjustments.
Best Answer
The Bonferroni adjustment will always provide strong control of the family-wise error rate. This means that, whatever the nature and number of the tests, or the relationships between them, if their assumptions are met, it will ensure that the probability of having even one erroneous significant result among all tests is at most $\alpha$, your original error level. It is therefore always available.
Whether it is appropriate to use it (as opposed to another method or perhaps no adjustment at all) depends on your objectives, the standards of your discipline and the availability of better methods for your specific situation. At the very least, you should probably consider the Holm-Bonferroni method, which is just as general but less conservative.
Regarding your example, since you are performing several tests, you are increasing the family-wise error rate (the probability of rejecting at least one null hypothesis erroneously). If you only perform one test on each half, many adjustments would be possible including Hommel's method or methods controlling the false discovery rate (which is different from the family-wise error rate). If you conduct a test on the whole data set followed by several sub-tests, the tests are no longer independent so some methods are no longer appropriate. As I said before, Bonferroni is in any case always available and guaranteed to work as advertised (but also to be very conservative…).
You could also just ignore the whole issue. Formally, the family-wise error rate is higher but with only two tests it's still not so bad. You could also start with a test on the whole data set, treated as the main outcome, followed by sub-tests for different groups, uncorrected because they are understood as secondary outcomes or ancillary hypotheses.
If you consider many demographic variables in that way (as opposed to just planning to test for gender differences from the get go or perhaps a more systematic modeling approach), the problem becomes more serious with a significant risk of “data dredging” (one difference comes out significant by chance allowing you to rescue an inconclusive experiment with some nice story about the demographic variable to boot whereas in fact nothing really happened) and you should definitely consider some form of adjustment for multiple testing. The logic remains the same with X different hypotheses (testing X hypotheses twice – one on each half of the data set – entails a higher family-wise error rate than testing X hypotheses only once and you should probably adjust for that).