Solved – Maximum a posteriori estimation with one single training example

bayesianestimationMATLAB

I am doing maximum a posteriori (MAP) to estimate $\mu$ and $\sigma$ with $N$ samples drawn from $\mathcal{N}(5, 1)$. The priors that I place are $\mu\sim\mathcal{N}(5, 1)$ and $\sigma\sim\mathcal{N}(1, 1)$.

Taking the derivatives of the posteriors and setting the derivatives to 0, I get
\begin{align}
-\mu+5+\frac{1}{\sigma^2}\left(\sum\limits_{n=1}^{N}x_n-N\mu\right) &= 0 \\
\ \\
-\sigma+1-\frac{N}{\sigma}+\frac{\sum_{n=1}^{N}(x_n-\mu)^2}{\sigma^3} &= 0,
\end{align}
which can be solved by plugging in $N$ data points $\{x_1,x_2,\ldots,x_N\}$.

My problem occurs when $N=1$. That is, I only have one data point available to do MAP. Say my that point is 5.1. Plugging in, I solve it in MATLAB by

syms m s;
% Assume \sigma is non-zero
S = solve([ ...
    s^2*(m-5)-5.1+m == 0, ...
    -s^4+s^3-s^2+(5.1-m)^2 == 0], ...
    [m, s]);

mus_hat = double(vpa(S.m));
sigmas_hat = double(vpa(S.s));

All the solutions are complex and hence cannot be correct.

I understand that a prior $\sigma\sim\mathcal{N}(1, 1)$ might be inappropriate. But how to explain it is this prior that causes all the solutions to be complex? I can't really see the link. Is there an intuitive explanation for this?

Best Answer

If we write down the posterior distribution on $(\mu,\sigma)$ associated with a single observation $x$, $$\pi(\mu,\sigma|x)\propto\sigma^{-1}\exp\frac{-1}{2}\left\{\sigma^{-2}(x-\mu)^2 +(\mu-5)^2+(\sigma-1)^2\right\}\mathbb{I}_{\mathbb{R}^+_-}(\sigma)$$ (as $\sigma$ is necessarily positive), this function takes the value $$\pi(x,\sigma|x)\propto\sigma^{-1}\exp\frac{-1}{2}\left\{(\mu-5)^2+(\sigma-1)^2\right\}\mathbb{I}_{\mathbb{R}^+_-}(\sigma)$$ when $\mu=x$ and $\pi(x,\sigma|x)$ is unbounded when $\sigma$ goes to zero: $$\lim_{\sigma\to 0^+} \pi(x,\sigma|x)=+\infty\,.$$ This demonstrates there is no MAP in this setting.

Best Answer

Related Solutions

Bayesian – Example of Maximum A Posteriori Estimation

Solved – Is MLE with regularization a bayesian method

Related Question