The fundamental difference is that in maximum likelihood based methods we can't integrate the nuisance parameters out (because the likelihood function is not a PDF and doesn't obey probability laws).
In maximum likelihood methods, the ideal way to deal with nuisance parameters is through marginal/conditional likelihoods, but these are defined differently from the question you linked. (There is a notion of an integrated (marginal/conditional) likelihood function as in the linked question, but this is not strictly the marginal likelihood function.)
Say you have a parameter of interest, $\theta$, a nuisance parameter, $\lambda$. Suppose a transformation of your data $X$ to $(Y, Z)$ exists such that either $Y$ or $Y|Z$ depends only on $\theta$. If $Y$ depends on $\theta$, then the joint density can be written
$f(Y, Z; \theta, \lambda) = f_{Y}(Y; \theta) f_{Z|Y}(Z|Y; \theta, \lambda)$.
In the latter case, we have
$f(Y, Z; \theta, \lambda) = f_{Y|Z}(Y|Z; \theta) f_{Z}(Z; \theta, \lambda)$.
In either case, the factor depending on $\theta$ alone is of interest. In the former, it's the basis for the definition of the marginal likelihood and in the latter, the conditional likelihood. The important point here is to isolate a component that depends on $\theta$ alone.
If we can't find such a transformation, we look at other likelihood functions to eliminate the nuisance. We usually start with a profile likelihood. To eliminate bias in the MLE, we try to obtain approximations for marginal or conditional likelihoods, usually through a "modified profile likelihood" function (yet another likelihood function!).
There are many details, but the short story is that the likelihood methods treat nuisance parameters quite differently than Bayesian methods. In particular, the estimated likelihoods don't account for uncertainty in the nuisance. Bayesian methods do account for it through the specification of a prior.
There are arguments in favor of an integrated likelihood function and lead to something resembling the Bayesian framework. If you're interested, I can dig up some references.
A Normal approximation works extremely well, especially in the tails. Use a mean of $\alpha/(\alpha+\beta)$ and a variance of $\frac{\alpha\beta}{(\alpha+\beta)^{2} (1+\alpha+\beta)}$. For example, the absolute relative error in the tail probability in a tough situation (where skewness might be of concern) such as $\alpha = 10^6, \beta = 10^8$ peaks around $0.00026$ and is less than $0.00006$ when you're more than 1 SD from the mean. (This is not because beta is so large: with $\alpha = \beta = 10^6$, the absolute relative errors are bounded by $0.0000001$.) Thus, this approximation is excellent for essentially any purpose involving 99% intervals.
In light of the edits to the question, note that one does not compute beta integrals by actually integrating the integrand: of course you'll get underflows (although they don't really matter, because they don't contribute appreciably to the integral). There are many, many ways to compute the integral or approximate it, as documented in Johnson & Kotz (Distributions in Statistics). An online calculator is found at http://www.danielsoper.com/statcalc/calc37.aspx. You actually need the inverse of this integral. Some methods to compute the inverse are documented on the Mathematica site at http://functions.wolfram.com/GammaBetaErf/InverseBetaRegularized/. Code is provided in Numerical Recipes. A really nice online calculator is the Wolfram Alpha site (www.wolframalpha.com ): enter inverse beta regularized (.005, 1000000, 1000001)
for the left endpoint and inverse beta regularized (.995, 1000000, 1000001)
for the right endpoint ($\alpha=1000000, \beta=1000001$, 99% interval).
Best Answer
According to the document, Matlab estimates the parameters for Beta distribution with the maximum likelihood. In this case, I am afraid that the sampling distributions for the estimators are not available. Let $\{X_i\}_{i=1}^n$ be a set of i.i.d. observations from standard Beta distribution $B(\alpha,\beta)$, obtaining the MLEs $\hat{\alpha}$ and $\hat{\beta}$ requires one to solve simultaneous equations \begin{align}\psi(\hat{\alpha})-\psi(\hat{\alpha}+\hat{\beta})&=\frac{1}{n}\sum_{i=1}^n\ln (X_i),\\\psi(\hat{\beta})-\psi(\hat{\alpha}+\hat{\beta})&=\frac{1}{n}\sum_{i=1}^n\ln (1-X_i),\end{align} where $\psi$ denotes the digamma function (see e.g. Gnanadesikan, Pinkham, and Hughes' Maximum Likelihood Estimation of the Parameters of the Beta Distribution from Smallest Order Statistics for more details). As far as I know, no general analytical solutions are available, let alone the exact distributions of $\hat{\alpha}$ and $\hat{\beta}$. It is possible to solve them in special cases however, e.g. when $\beta$ are known to be exactly $1$, by properties of digamma function the first equation can be reduced to $$\hat{\alpha}=-\frac{n}{\sum_{i=1}^n\ln (X_i)},$$ where one can derive that $-\sum_{i=1}^n\ln (X_i)\sim\varGamma(n,1/\alpha)$, which implies $\hat{\alpha}$ has scaled inverse-Gamma distribution. Other than that, one has to resort to the asymptotic normality of MLEs to make inferences on $\hat{\alpha}$ and $\hat{\beta}$, also see e.g. Lau and Lau's Effective procedures for estimating beta distribution's parameters and Their confidence intervals for procedure using Pearson distribution. In addition, one may utilize the normality to derive confidence intervals for the mean, since $$\hat{\mu}=\frac{\hat{\alpha}}{\hat{\alpha}+\hat{\beta}}$$ will have correlated normal ratio distribution if both $\hat{\alpha}$ and $\hat{\beta}$ are normal.
For your second question, I take that logs of estimates are used most likely to cope with the problem of parameter space. Note that both $\alpha$ and $\beta$ must be in $(0,\infty)$. But if your $\hat{\alpha}=1$ and you use normal approximation directly, you may end up with confidence interval such as $(-0.5,2.5)$. You can get around this by mapping to log-scale and then back, though I am not sure how accurate the result will be then.
As an interesting side note, the book (Hahn and Shapiro's Statistical Models in Engineering) cited in the documentation actually talked about method of moments estimate instead of MLE, it is not clear to me why it is cited here.