Because VAEs are really a graphical model more than they are autoencoders, it can be confusing what exactly "reconstruction" means in context.
Recall that we have an lower bound on the log probability given by the RHS of:
$$\log p(x) - D_{KL}(Q(z|x)||P(z|x)) = E_{z\sim Q}[\log p(x|z)] - D_{KL}(Q(z|x)||P(z))$$
To compute this lower bound -- a necessary prerequisite for doing a backprop pass to maximize it -- corresponds to passing $x$ through the encoder to arrive at $Q(z|x)$, computing the KL-divergence, then estimating $E_{z \sim Q}[\log p(x|z)]$ by sampling once or more (but usually just once) from $Q$ and running the decoder.
This process of estimating the posterior with the encoder and then sampling to approximate the expectation in the RHS so closely mimics the computation of an autoencoder would do that we call it "reconstruction". However, it's really just a side effect of trying to maximize the log probability of the inputs.
What happens when you sample multiple times from $Q$? The immediate consequence is that you get a better approximation of the expectation, and hence a better approximation of the lower bound on the log probability. You also need to run the decoder multiple times, which can be expensive, so it is usually not done. Of course if you do this, then you end up with many reconstructions rather than just one. Note that it is definitely not possible to average the reconstructions and have a meaningful output.
So you probably just want to sample once.
In response to your edit, the correct way to write it would be
$$\begin{align*} E_{z \sim Q}[\log p(x|z)] &\approx \frac{1}{n}\sum_{i=1}^N \log p(x|z_i) \\
&\propto -\frac{1}{n}\sum_i ||x-\text{decode}(z_i)||_2^2 \\
&= -\frac{1}{nm} \sum_i \sum_j (x_j - \text{decode}(z_i)_j)^2\end{align*}$$
We would expect that the reconstructions $\text{decode}(z_i)$ look quite similar to each other, but not exactly the same. Exactly how well depends on the nature of the data and how well the model is fitted.
Best Answer
Normal distribution is not the only distribution used for latent variables in VAEs. There are also works using von Mises-Fisher distribution (Hypershperical VAEs [1]), and there are VAEs using Gaussian mixtures, which is useful for unsupervised [2] and semi-supervised [3] tasks.
Normal distribution has many nice properties, such as analytical evaluation of the KL divergence in the variational loss, and also we can use the reparametrization trick for efficient gradient computation (however, the original VAE paper [4] names many other distributions for which that works). Moreover, one of the apparent advantages of VAEs is that they allow generation of new samples by sampling in the latent space—which is quite easy if it follows Gaussian distribution. Finally, as @shimao remarked, it does not matter so much what distribution latent variables follow since using the non-linear decoder it can mimic arbitrarily complicated distribution of observations. It is simply convenient.
As for the second question, I agree with @shimao's answer.
[1]: Davidson, T.R., Falorsi, L., De Cao, N., Kipf, T. and Tomczak, J.M., 2018. Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891.
[2]: Dilokthanakul, N., Mediano, P.A., Garnelo, M., Lee, M.C., Salimbeni, H., Arulkumaran, K. and Shanahan, M., 2016. Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648.
[3]: Kingma, D.P., Mohamed, S., Rezende, D.J. and Welling, M., 2014. Semi-supervised learning with deep generative models. In Advances in neural information processing systems (pp. 3581-3589).
[4]: Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.