Just an initial remark, if you want computational speed you usually have to sacrifice accuracy. "More accuracy" = "More time" in general. Anyways here is a second order approximation, should improve on the "crude" approx you suggested in your comment above:
$$E\Bigg(\frac{X_{j}}{\sum_{i}X_{i}}\Bigg)\approx
\frac{E[X_{j}]}{E[\sum_{i}X_{i}]}
-\frac{cov[\sum_{i}X_{i},X_{j}]}{E[\sum_{i}X_{i}]^2}
+\frac{E[X_{j}]}{E[\sum_{i}X_{i}]^3} Var[\sum_{i}X_{i}]
$$
$$= \frac{\alpha_{j}}{\sum_{i} \frac{\beta_{j}}{\beta_{i}}\alpha_{i}}\times\Bigg[1 - \frac{1}{\Bigg(\sum_{i} \frac{\beta_{j}}{\beta_{i}}\alpha_{i}\Bigg)}
+ \frac{1}{\Bigg(\sum_{i} \frac{\alpha_{i}}{\beta_{i}}\Bigg)^2}\Bigg(\sum_{i} \frac{\alpha_{i}}{\beta_{i}^2}\Bigg)\Bigg]
$$
EDIT An explanation for the above expansion was requested. The short answer is wikipedia. The long answer is given below.
write $f(x,y)=\frac{x}{y}$. Now we need all the "second order" derivatives of $f$. The first order derivatives will "cancel" because they will all involve multiples $X-E(X)$ and $Y-E(Y)$ which are both zero when taking expectations.
$$\frac{\partial^2 f}{\partial x^2}=0$$
$$\frac{\partial^2 f}{\partial x \partial y}=-\frac{1}{y^2}$$
$$\frac{\partial^2 f}{\partial y^2}=2\frac{x}{y^3}$$
And so the taylor series up to second order is given by:
$$\frac{x}{y} \approx \frac{\mu_x}{\mu_y}+\frac{1}{2}\Bigg(-\frac{1}{\mu_y^2}2(x-\mu_x)(y-\mu_y) + 2\frac{\mu_x}{\mu_y^3}(y-\mu_y)^2 \Bigg)$$
Taking expectations yields:
$$E\Big[\frac{x}{y}\Big] \approx \frac{\mu_x}{\mu_y}-\frac{1}{\mu_y^2}E\Big[(x-\mu_x)(y-\mu_y)\Big] + \frac{\mu_x}{\mu_y^3}E\Big[(y-\mu_y)^2\Big]$$
Which is the answer I gave. (although I initially forgot the minus sign in the second term)
I don't think this is an "overparamaterized" model at all. I would argue that by placing a prior over the Dirichlet paramaters, you're being less committal about any particular outcome. In particular, as you probably know, for symmetric dirichlet distributions (i.e. $\alpha_1 = \alpha_2 = ... \alpha_K$) setting $\alpha<1$ gives more prior probability to sparse multinomial distributions, while $\alpha>1$ gives more prior probability to smooth multinomial distributions.
In cases where one has no strong expectation for either sparse or dense multinomial distributions, placing a hyperprior over your Dirichlet distribution gives your model some added flexibility to chose between them.
I originally got the idea of doing this from this paper. The hyperprior they use is slightly different than what you suggest. They sample a probability vector from a dirichlet and then scale it by a draw from an exponential (or gamma). So the model is
\begin{eqnarray}
\beta &\sim &Dirichlet(1)\\
\lambda& \sim &Exponential(\cdot)\\
\theta& \sim &Dirichlet(\beta\lambda)
\end{eqnarray}
The extra Dirichlet is simply to avoid imposing symmetry.
I've also seen people use just the Gamma hyper prior for a Dirichlet in the context of hidden markov models with multinomial emission distributions, but I can't seem to find a reference. Also, it seems like I've encountered similar hypers used in topic models.
Best Answer
A Beta distribution is just a special case of the Dirichlet distribution, that is, a Beta distribution is a Dirichlet distribution with two parameters, alpha and beta.
Dirichlet is the multidimensional generalisation (of Beta) with 'n' parameters instead of two. The parameters of Dirichlet are denoted by alpha with an index as a subscript. Setting all the alphas of a Dirichlet to 1 (no matter how many dimensions we are taking about) will give us the 'n' equivalents of Beta(1,1).