Is it because it allows for easy construction for confidence intervals? Isn't it still possible to construct confidence intervals without this property i.e. if it converged to another distribution? Please tell me some reasons why you want an estimator to be asymptotically normal?
Solved – Why is asymptotic normality important for an estimator
confidence intervalestimatorsnormality-assumption
Related Solutions
It is easier to construct confidence ellipsoids for multidimensional normals. The probabilities you want require messy numerical integration, I think, since the regions are open-sided rectangular. I think you could certainly bootstrap the samples and get bootstrap estimates of the joint probabilities. But I think that would not be commonly done since there is an exact answer even though it involves multidimensional numerical integration.
There are several misunderstandings in you post (some of which are common and you may have been told the wrong thing because the person telling you was just passing on the misinformation).
First is that bootstrap is not the savior of the small sample size. Bootstrap actually fairs quite poorly for small sample sizes, even when the population is normal. This question, answer, and discussion should shed some light on that. Also the article here gives more details and background.
Both the t-test and the bootstrap are based on sampling distributions, what the distribution of the test statistic is.
The exact t-test is based on theory and the condition that the population/process generating the data is normal. The t-test happens to be fairly robust to the normality assumption (as far as the size of the test goes, power and precision can be another matter) so for some cases the combination of "Normal enough" and "Large sample size" means that the sampling distribution is "close enough" to normal that the t-test is a reasonable choice.
The bootstrap instead of assuming a normal population, uses the sample CDF as an estimate of the population and computes/estimates (usually through simulation) the true sampling distribution (which may be normalish, but does not need to be). If the sample does a reasonable job of representing the population then the bootstrap works well. But for small sample sizes it is very easy for the sample to do a poor job of representing the population and the bootstrap methods do lousy in those cases (see the simulation and paper referenced above).
The advantage of the t-test is that if all the assumptions hold (or are close) then it works well (I think it is actually the uniformly most powerful test). The disadvantage is that it does not work well if the assumptions are not true (and not close to being true) and there are some cases where the assumptions make a bigger differences than in others. And the t-test theory does not apply for some parameters/statistics of interest, e.g. trimmed means, standard deviations, quantiles, etc.
The advantage of the bootstrap is that it can estimate the sampling distribution without many of the assumptions needed by parametric methods. It works for statistics other than the mean and in cases where other assumptions do not hold (e.g. 2 samples, unequal variances). The disadvantage of the bootstrap is that it is very dependent on the sample representing population because it does not have the advantages of other assumptions. The bootstrap does not give you normality, it gives you the sampling distribution (which sometimes looks normal, but still works when it is not) without needing the assumptions about the population.
For t-tests where it is reasonable to assume that the population is normal (or at least normal enough) then the t-test will be best (of the 2).
If you do not have normality and do have small samples, then neither the t-test or the bootstrap should be trusted. For the 2 sample case a permutation test will work well if you are willing to assume equal distributions (including equal variances) under the null hypothesis. This is a very reasonable assumption when doing a randomized experiment, but may not be when comparing 2 separate populations (but then if you believe that 2 populations may have different spreads/shapes then maybe a test of means is not the most interesting question or the best place to start).
With huge sample sizes the large sample theory will benefit both t-tests and bootstrapping and you will see little or no difference when comparing means.
With moderate sample sizes the bootstrap can perform well and may be preferred when you are unwilling to make the assumptions needed for the t-test procedures.
The important thing is to understand the assumptions and conditions that are required for the different procedures that you are considering and to consider how those conditions and deviations from them will affect your analysis and how you believe the population/process that produced your data fits those conditions, simulation can help you understand how the deviations affect the different methods. Remember that all statistical procedures have conditions and assumptions (with the possible exception of SnowsCorrectlySizedButOtherwiseUselessTestOfAnything, but if you use that test then people will make assumptions about you).
Best Answer
I wouldn't say it's important, really, but when it happens, it can be convenient, and the plain fact is, it happens a lot -- for many popular estimators in commonly used models, it is the case that the distribution of an appropriately standardized estimator will be asymptotically normal.
So whether I wish it or not, it happens. [Indeed, in these notes, Charles Geyer says "almost all estimators of practical interest are [...] asymptotically normal", and I think that's probably a fair assessment.]
Well, it does allow easy construction of confidence intervals if the sample sizes are large enough that you could reasonably approximate the sampling distribution as normal. ... as long as you have a computer, or tables, or happen to remember the critical values you want. [Without any of those, it would be mildly inconvenient ... but I can manage okay even if I decide to compute an 85% interval or a 96.5% interval or whatever even if I don't have a computer or tables, since I can take a nearby value I know, or a pair of nearby values either side of the value I want, and do a little bit of playing with a calculator ... or at the worst, a pen and paper, and get an interval that'll be accurate enough; after all, it's already an approximation in at least a couple of different ways, so how accurate do I really need it?]
But I really wouldn't say that "I want asymptotic normality because of that".
I construct finite-sample CIs all the time without bothering with normality. I'm perfectly happy to use a binomial(40,0.5) interval or a $t_{80}$ interval or a $\chi^2_{100}$ interval or an $F_{60,120}$ interval instead of trying to invoke asymptotic normality in any of those cases, so asymptotic-something-else wouldn't have been a big deal. Indeed, I use permutation tests at least sometimes, and generate CIs from permutation or randomization distributions, and I don't give a damn about the asymptotic distribution when I do (since one conditions on the sample, asymptotics are irrelevant).
Yes, absolutely. Imagine some scaled estimator was say asymptotically chi-squared with 2df (which is not normal). Would I be bothered? Would it even be mildly inconvenient? Not a bit. (If anything, in some ways that would be easier)
But even if the asymptotic distribution weren't especially convenient, that wouldn't necessarily bother me. For example, I can happily use a Kolmogorov-Smirnov test without difficulty, and the statistic is an estimator of something. It's not convenient in the sense that I could only write down the asymptotic distribution as an infinite sum (but it is convenient in that I just go ahead and use either tables or a computer program to do things with it ... just as I do with the normal).
On the other hand, we needn't (and shouldn't) ignore the fact that the most common kinds of estimator will often be asymptotically normal -- MLEs are usually asymptotically normal, as are method-of-moments estimators and estimators based on (non-extreme) quantiles (and more besides). I'm not going to ignore it when happens.
I don't, especially. But if it happens, I'm happy to use that fact whenever it's convenient and reasonable to do that instead of something else.