I guess this is probably a little late, but this result is immediate from Basu's Theorem, provided that you are willing to accept that the family of normal distributions with known variance is complete. To apply Basu, fix $\sigma^2$ and consider the family of $N(\mu, \sigma^2)$ for $\mu \in \mathbb R$. Then $\frac{(n - 1)S^2}{\sigma^2} \sim \chi^2_{n - 1}$ so $S^2$ is ancillary, while $\bar X$ is complete sufficient, and hence they are independent for all $\mu$ and our fixed $\sigma^2$. Since $\sigma^2$ was arbitrary, this completes the proof.
This can also be shown directly without too much hassle. One can find the joint pdf of $(A, \bar X)$ directly by making a suitable transformation to the joint pdf of $(X_1,\cdots, X_n)$. The joint pdf of $(A, \bar X)$ factors as required, which gives independence. To see this quickly, without actually doing the transformation, skipping some algebra we may write
$$f(x_1, x_2, ..., x_n) = (2\pi \sigma^2)^{-n/2} \exp\left\{-\frac{\sum(x_i - \bar x)^2}{2\sigma^2}\right\} \exp\left\{-\frac{n(\bar x - \mu)^2}{2\sigma^2}\right\}$$
and we can see that everything except for the last term depends only on
$$(x_2 - \bar x, x_3 - \bar x, ..., x_n - \bar x)$$
(note we may retrieve $x_1 - \bar x$ from only the first $n - 1$ deviations) and the last term depends only on $\bar x$. The transformation is linear, so the jacobian term won't screw this factorization up when we actually pass to the joint pdf of $(A, \bar X)$.
Apart from the fact that it should be $m-1$ instead of $n-1$ in the right-hand denominator, your estimator for $\sigma^2$ looks fine. You can do slightly better on the variance of $\hat\mu$ (though the question didn't ask to optimize it): Consider a general convex combination
$$
\alpha\frac{X_1+\dotso+X_n}n+(1-\alpha)\frac{Y_1+\dotso+Y_m}{2m}
$$
of the individual estimators for $\mu$. The variance of this combined estimator is
$$
n\left(\frac\alpha n\right)^2\sigma^2+m\left(\frac{1-\alpha}{2m}\right)^2\sigma^2=\left(\frac{\alpha^2}n+\frac{(1-\alpha)^2}{4m}\right)\sigma^2\;,
$$
and minimizing this by setting the derivative with respect to $\alpha$ to zero leads to $\alpha=n/(n+4m)$, yielding the variance $\sigma^2/(n+4m)$. For $n=m$ the variance is $\frac15\sigma^2/n=0.2\sigma^2/n$, compared to $\frac5{16}\sigma^2/n\approx0.3\sigma^2/n$ for your estimator, and for $n$ fixed and $m\to\infty$ or vice versa, the variance of this estimator tends to zero whereas the variance of your estimator tends to a non-zero value.
You could optimize the variance of your unbiased variance estimator in a similar way, though the calculation would be a bit more involved.
Best Answer
I suppose your question is whether the two formulas give the same answer for binary data. Here is an example to illustrate that they are almost the same, but not exactly.
Suppose I have a sample of a thousand zeros and ones in which there are 283 ones. Then $\bar X = 283/1000 = 0.283.$ Thus, $\bar X(1-\bar X) = 0.283(1 - 0.283) = 0.202911.$
An alternate general formula for the sample variance of values $X_i$ is
$$S^2 = \frac{\sum_{i=1}^n X_i^2 - n \bar X^2}{n-1}.$$
In a binary sample $\sum_{i=1}^n X_i^2 = \sum_{i=1}^n X_i$, because $0^2 = 0$ and $1^2 = 1.$
Thus, the general formula gives $S^2 = \frac{283 - 1000(.283)^2}{999} = 0.2031141.$ If (as in the Comment by @A.S) the denominator were $n = 1000$ instead of $n-1=999,$ this would simplify to $$S^2 = 0.283 - 0.283^2 = 0.283(1 = 0.283) = \bar X(1- \bar X).$$
The formula for the population variance is often written with the population size $n$ in the denominator.