According to the question, it is a an assumed fact that both populations have common variance, and not something one wishes to test.
Maximum likelihood estimators can be derived as usual either from the two samples separately, or by pooling them, in which case we will have an independent but non-identically distributed sample and corresponding log-likelihood, something that nevertheless creates no special issues. So, more than deriving the MLEs (which is straightforward), I would say that this is a good example in order to examine whether pooling samples ("unite and conquer"?) is more beneficial than keeping the samples separate ("divide and conquer"?). But "more beneficial" according to which criteria?
We will discuss them as we go along.
Note that we need both sample sizes to be larger than unity, $n_1 >1, n_2 > 1$, otherwise the variance estimator will equal zero.
If we keep the samples separate we will obtain
$$\hat \mu_v = \frac 1{n_1}\sum_{i=1}^{n_1}v_i,\;\;\; \hat \sigma^2_1 = \frac 1{n_1}\sum_{i=1}^{n_1}(v_i-\hat \mu_v)^2$$
and
$$\hat \mu_w = \frac 1{n_2}\sum_{i=1}^{n_2}w_i,\;\;\; \hat \sigma^2_2 = \frac 1{n_2}\sum_{i=1}^{n_2}(w_i-\hat \mu_w)^2$$
The MLEs for the means will be unbiased, efficient, consistent and asymptotically normal.
The variance estimators will be biased, consistent and asymptotically normal (see this post, which holds in general, even for normal samples).
Since we have bias here, it is an easy thought to turn to Mean Squared Error. The populations are normal, so we also have a finite-sample result:
$$\frac {n_i\hat \sigma^2_i}{\sigma^2} \sim \chi^2_{n_i-1} \Rightarrow \hat \sigma^2_i \sim \operatorname{Gamma}(k_i,\theta_i),\;\; k_i = \frac {n_i-1}{2},\;\; \theta_i = \frac {2\sigma^2}{n_i},\;\;i=1,2$$
Therefore we can calculate the Mean Squared Error (MSE) as
$$MSE(\hat \sigma^2_i) = \text{Var}(\hat \sigma^2_i)+\left[B(\hat \sigma^2_i)\right]^2 = \frac{2(n_i-1)}{n_i^2} \sigma^4 + \frac 1{n_i^2}\sigma^4 = \frac{2n_i-1}{n_i^2} \sigma^4$$
We turn now to the pooled-samples case.
It is easy to verify that the MLE's for the two means will be identical with the separate-samples approach. So as regards these estimators, pooling the two samples or not, makes no difference as regards the functional form of the estimators, or their properties.
But the variance estimator will be different. It is also rather easy to derive that
$$\hat \sigma^2_p = \frac{n_1}{n_1+n_2}\hat \sigma^2_1+\frac{n_2}{n_1+n_2}\hat \sigma^2_2$$
This is also a biased an consistent estimator, and also asymptotically normal, being the convex combination of two asymptotically normal variables.
Turning to the issue of bias and Mean Squared Error, since the two separate-samples estimators are independent we have that
$$\text{Var}(\hat \sigma^2_p) = \frac{n_1^2}{(n_1+n_2)^2}\frac{2(n_1-1)}{n_1^2} \sigma^4+\frac{n_2^2}{(n_1+n_2)^2}\frac{2(n_2-1)}{n_2^2}\sigma^4 = \frac {2n_1+2n_2-4}{(n_1+n_2)^2}\sigma^4$$
and
$$B\left(\hat \sigma^2_p\right) = \frac{n_1}{n_1+n_2}E(\hat \sigma^2_1)+\frac{n_2}{n_1+n_2}E(\hat \sigma^2_2) - \sigma^2 = \frac {-2}{n_1+n_2} \sigma^2$$
So the MSE here is
$$MSE(\hat \sigma^2_p) = \frac {2n_1+2n_2-4}{(n_1+n_2)^2}\sigma^4+\frac {4}{(n_1+n_2)^2} \sigma^4 = \frac {2}{n_1+n_2}\sigma^4$$
In order for sample-pooling to be superior in MSE terms we want that
$$MSE(\hat \sigma^2_p) < MSE(\hat \sigma^2_i), i=1,2$$
$$\Rightarrow \frac {2}{n_1+n_2}\sigma^4 < \frac{2n_i-1}{n_i^2} \sigma^4 \Rightarrow 2n_i^2 < 2n_in_1 - n_1 + 2n_in_2 - n_2$$
This reduces to the same condition for either $i=1$ or $i=2$, namely
$$0 < - n_1 + 2n_1n_2 - n_2 \Rightarrow \frac {n_1+n_2}{n_1n_2} < 2 \Rightarrow \frac 1{n_2} + \frac {1}{n_1} < 2$$
which holds, since both sample sizes are strictly higher than unity.
Therefore we conclude, that "unite & conquer" is the MSE-efficient approach here.
But we will lose something: if $n_1 \neq n_2$ the pooled-sample variance estimator does not give a Gamma finite sample distributional result, because it is the linear combination of two Gamma random variables with different scale parameters (different $\theta_i$'s). This does not result into a Gamma, but into a rather complicated infinite sum expression (see this paper). Which means that for conducting tests related to the pooled-sample variance estimator, we will have to resort to the asymptotic normality result.
Alternatively, if the difference between $n_1$ and $n_2$ is not large, and both samples have respectable sizes, we may even consider dropping observations from the larger sample in order to make $n_1 =n_2$ and preserve the Gamma distribution result.
You seem to be confusing many things in your question.
Is there any quick solution (either by any statistical software or
manual workout) to find the maximum likelihood estimates of alpha of
three independent variables of a dirichlet distribution
First of all, Dirichlet distribution is a multivariate distribution, I assume that you mean trivariate distribution in here. Obviously, the individual variables are not independent, it should be obvious at least from the fact that if $x_1,x_2,\dots,x_k$ are draws from Dirichlet distribution, then $\sum_{i=1}^k x_i = 1$, so they need to be dependent to meet the constraint.
provided that the initial values of the three parameters are found by
the method of moments.
What do you mean by "method of moments" in here? There are many ways of computing the parameters of Dirichlet distribution (see, e.g. Minka, 2000; Huang, 2005), that in most cases maximize the likelihood numerically, and there is no simple, closed-form solution.
Or, Is the initial values sufficient to obtain posterior estimates in
multinomial-dirichlet bayesian analysis?
To obtain posterior in Bayesian analysis you do not need to find the maximum likelihood estimates of the parameters. Maximum likelihood and Bayesian approaches are two different approaches to estimating parameters. Maximum likelihood is about finding such combination of parameters that maximize the likelihood function. In Bayesian case, you estimate the parameters in terms of the likelihood function and the priors. In Dirichlet-multinomial model (this is not the same as Dirichlet distribution), this is straightforward since Dirichlet is a conjugate prior for the multinomial distribution, and we have a closed form solution. The posterior estimate for $k$-th $\alpha$ is $\alpha_k + y_k$ where $\alpha_k$ is your prior guess for $\alpha_k$ and $y_k$ is the observed number of successes for $k$-th category in the multinomial distribution.
Huang, J. (2005). Maximum likelihood estimation of Dirichlet distribution parameters. CMU Technique Report.
Minka, T. (2000). Estimating a Dirichlet distribution. Online draft.
Best Answer
Suppose $\mathbf p_1, \ldots, \mathbf p_n$ are iid $\operatorname{Dirichlet}(s \mathbf m)$. If I'm understanding you correctly, your question is "why use an iterative scheme when $\hat {\mathbf m} = \frac 1 N \sum_{i = 1} ^ N \mathbf p_i$ works?" You are correct that this is a reasonable estimator. But it isn't the maximum likelihood estimator, which is what we care about! The Dirichlet likelihood is $$ L_i(\pmb \alpha) = \frac{\Gamma(\sum_k \alpha_k)}{\prod_k\Gamma(\alpha_k)} \prod_k p_{ik}^{\alpha_k - 1} $$ so our goal is to maximize $\prod_i L_i (\pmb \alpha)$ in $\pmb \alpha$; once we do this, we can get the maximum likelihood estimate of $\mathbf m$ by normalizing. But it is easy to see that the likelihood is a function of $\frac 1 N \sum_i \log \mathbf p_i$ rather than $\frac 1 N \sum_i \mathbf p_i$ (I'm using $\log$ elementwise here). In some sense, we might think of $\log \mathbf p_i$ as the "appropriate scale" of the data - at least, for the Dirichlet distribution - rather than the untransformed $\mathbf p_i$.
So, we believe that the MLE is not $\frac 1 N \sum_i \mathbf p_i$ but rather is some complicated function of $\frac 1 N \sum_i \log \mathbf p_i$. The question now becomes "why use the MLE rather than the easy estimator?" Well, we have some theorems which say the MLE has certain optimality properties. So, we get a more efficient estimator with the MLE, although $\frac 1 N \sum_i \mathbf p_i$ may still be useful as a starting point for the iterative algorithm. Now, I'm not sure how good the MLE really is here, considering that the data must actually be Dirichlet distributed for it to work, whereas $\frac 1 N \sum \mathbf p_i$ is consistent no matter what. But that is another story.