This is basically a question about $p$-values and maximum likelihood. Let me quote Cohen (1994) in here
What we want to know is "Given this data what is the probability that
$H_0$ is true?" But as most of us know, what it [$p$-value] tells us
is "Given that $H_0$ is true, what is the probability of this (or more
extreme) data?" These are not the same (...)
So $p$-value tells us what is the $P(D|H_0)$, while we are interested in $P(H_0|D)$ (see also the discussion on Fisherian vs Neyman-Pearson framework).
Let's forget for a moment about $p$-values. The probability of observing our data given some parameter $\theta$ is the likelihood function
$$ L(\theta | D) = P(D|\theta) $$
that is one way of looking at statistical inference. Another way is Bayesian approach where we want to learn directly (rather than indirectly) about $P(\theta|D)$ by employing the Bayes theorem and using priors for $\theta$
$$ \underbrace{P(\theta|D)}_\text{posterior} \propto \underbrace{P(D|\theta)}_\text{likelihood} \times \underbrace{P(\theta)}_\text{prior} $$
Now, if you look at the overall picture, you'll see that $p$-values and likelihood answer a different questions than Bayesian estimation.
So, while maximum likelihood estimates should be the same as MAP Bayesian estimates under uniform priors, you have to remember that they answer a different question.
Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.
I don't believe there is a single answer to this question.
When we consider possible distributional misspecification while applying maximum likelihood estimation, we get what is called the "Quasi-Maximum Likelihood" estimator (QMLE). In certain cases the QMLE is both consistent and asymptotically normal.
What it loses with certainty is asymptotic efficiency. This is because the asymptotic variance of $\sqrt n (\hat \theta - \theta)$ (this is the quantity that has an asymptotic distribution, not just $\hat \theta$) is, in all cases,
$$\text{Avar}[\sqrt n (\hat \theta - \theta)] = \text{plim}\Big( [\hat H]^{-1}[\hat S \hat S^T][\hat H]^{-1}\Big) \tag{1}$$
where $H$ is the Hessian matrix of the log-likelihood and $S$ is the gradient, and the hat indicates sample estimates.
Now, if we have correct specification, we get, first, that
$$\text{Avar}[\sqrt n (\hat \theta - \theta)] = (\mathbb E[H_0])^{-1}\mathbb E[S_0S_0^T](\mathbb E[H_0])^{-1} \tag{2}$$
where the "$0$" subscript denotes evaluation at the true parameters (and note that the middle term is the definition of Fisher Information), and second, that the "information matrix equality" holds and states that $-\mathbb E[H_0] = \mathbb E[S_0S_0^T]$, which means that the asymptotic variance will finally be
$$\text{Avar}[\sqrt n (\hat \theta - \theta)] = -(\mathbb E[H_0])^{-1} \tag{3}$$
which is the inverse of the Fisher information.
But if we have misspecification, expression $(1)$ does not lead to expression $(2)$ (because the first and second derivatives in $(1)$ have been derived based on the wrong likelihood). This in turn implies that the information matrix inequality does not hold, that we do not end up in expression $(3)$, and that the (Q)MLE does not attain full asymptotic efficiency.
Best Answer
While this method doesn't do anything clever in terms of the structure or algorithm, it is quicker in Matlab to do the following:
Notice that \begin{equation} \hat{x}=\Sigma(\Sigma+\sigma^2)^{-1}y \end{equation}
or even better (after seeing Alexey's answer) \begin{equation} \hat{x}=y - \left (I+\frac{1}{\sigma^2}\Sigma \right )^{-1}y \end{equation}
We can compare these in Matlab to the initial naive implementation using the code
I get answers of
In order to get any more really significant speedup, I'm guessing you would have to start taking advantage of the structure of your matrices if possible.