Ok. $\Sigma$ is the $3\times 3$ covariance matrix of the joint distribution of your three variables (just look at the article). It is then unequally partitioned into sub-matrices. Denoting $v_{ij}$ the elements of $\Sigma$, we have, following the notation of the article,
$$\Sigma_{11} = v_{11}=\sigma^2_1\;,\; \Sigma_{12} = [v_{12}\;\; v_{13}]\;,\; \Sigma_{21} = \left[ \begin{matrix} v_{21}\\ \\ v_{13}\end{matrix} \right]$$
$$\Sigma_{22} =\left[\begin{matrix} v_{22} &v_{23} \\ v_{32} & v_{33}
\end{matrix}\right] = \left[\begin{matrix} \sigma^2_{2} &v_{23} \\ v_{32} & \sigma^2_{3}
\end{matrix}\right]$$
Then the conditional expectation function $E(X_1\mid X_2,X_3)$ is
$$E(X_1\mid X_2,X_3) = \mu_1 + \Sigma_{12}\Sigma^{-1}_{22}\left [\begin{matrix} X_2-\mu_2 \\ \\X_3-\mu_3\end{matrix} \right] $$
I guess now you can work the term for the conditional variance.
Let's start with a very simple example of Bayesian inference that includes some of the issues you raise. Then you may have a framework for follow-up questions and raising additional issues.
A political consultant is hired to advise one candidate in an upcoming election. From prior experience with other elections and some knowledge of the candidate, the consultant has the prior distribution $\mathsf{Beta}(330, 270)$ for the probability $\theta$ that the candidate will win. That is, the consultant thinks the probability the candidate will win is roughly 0.55 and likely between 0.51 and 0.59. Computation in R:
330/(330+270)
[1] 0.55 # mean of BETA(330, 270)
qbeta(c(.025, .975), 330, 270)
[1] 0.5100824 0.5896018
The prior distribution has density proportional to
$p(\theta) \propto \theta^{330-1}(1-\theta)^{270-1}.$
Choosing the prior distribution is often at least partially a matter of opinion. The consultant might have been just as happy with another similar beta distribution as her prior.
Results of a public opinion poll by a reputable pollster show that $x = 620$ out of $n = 1000$ randomly chosen likely voters favor the candidate. Thus the binomial likelihood is proportional to
$L(x|\theta) = \theta^{620}(1-\theta)^{1000-620}.$
Then by Bayes' Theorem, the posterior distribution is
proportional to
$$\theta^{330-1}(1-\theta)^{270-1}\times\theta^x(1-\theta)^{1000-620} \propto g(\theta|x) \\ \propto
\theta^{330 + 620 - 1}(1-\theta)^{270 + 1000 -620-1}\\
\propto \theta^{950-1}(1-\theta)^{650-1},$$
where we recognize the last term as proportional to the density function of $\mathsf{Beta}(950, 650).$ Information in this posterior distribution is a melding of information in the prior distribution and in the data.
In this case, it is easy to find the posterior distribution because the binomial likelihood is 'conjugate to' (mathematically compatible with) the beta density of the prior distribution.
A 95% Bayesian probability interval $(.670, 618)$
for $\theta$ can be found by cutting 2.5% of the probability from each tail of the posterior distribution. Possible point estimates are the mean, median, or mode (in this case, all about 0.594).
qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932
Here is a plot of the prior and posterior distributions. The 95% posterior probability interval is shown by dashed lines.
So data from the poll together with the prior distribution show a slightly more favorable standing of the candidate than did the prior distribution.
Notes: (1) If the prior distribution in this example had
been the 'noninformative' Jeffrey's prior $\mathsf{Beta}(.5,.5),$ then the 95% Bayesian posterior interval would have been nearly the same (numerically) as a frequentist 95% confidence interval (but Bayesians and frequentists interpret interval estimates somewhat differently).
(2) A conjugate prior distribution for a Poisson likelihood function is a gamma distribution. Normal likelihood functions are conjugate to normal likelihood functions.
(3) Reference. Suess & Trumbo (2010), Springer.
The example shown above is similar to one found in Chapter 8 of this book.
Best Answer
There is a mistake in the fourth formula, the one you are trying to understand (that is apparent from the last formula where that mistake disappears). Precisely, I mean it should be written $$\hat{f} \left( x \right) = \frac{\hat{f}_{\text{trans}} \left( T_{\hat{\alpha}, \hat{M}, \hat{c}} \left( x \right) \right)}{\left| \left( T^{- 1}_{\hat{\alpha}, \hat{M}, \hat{c}} \right)' \left( x \right) \right|}$$ and not $$\hat{f} \left( x \right) = \frac{\hat{f}_{\text{}} \left( T_{\hat{\alpha}, \hat{M}, \hat{c}} \left( x \right) \right)}{\left| \left( T^{- 1}_{\hat{\alpha}, \hat{M}, \hat{c}} \right)' \left( T_{\hat{\alpha}, \hat{M}, \hat{c}} \left( x \right) \right) \right|}$$
The notation in these formulas are clumsy and not very intuitive, but I will explain how that formula is derived and where the mistake occurs.
The relation between the two random variables $Y$ and $X$ is given by $Y = T \left( X \right)$ (I will denote by $T$ the function $T_{\hat{\alpha}, \hat{M}, \hat{c}}$ to simplify notation). The transformation $T$ is the cumulative distribution function of an absolutely continuous random variable and thus is strictly montonically increasing with unique inverse $T^{- 1}$. Let $t \left( x \right) = T' \left( x \right) = \frac{\partial T \left( x \right)}{\partial x}$ be the density corresponding to $T$. Denote the by $f_X$ and $f_Y$ the densities of $X$ and $Y$. The relation between the two densities is $$ f_X \left( x \right) = f_Y \left( T \left( x \right) \right) t \left( \left. x \right) \right. $$ $$ f_Y \left( y \right) = f_X \left( T^{- 1} \left( y \right) \right) \frac{1}{t \left( T^{- 1} \left( y \right) \right)} $$ This is clear because the jacobian of the transformation $X = T^{- 1} \left( y \right)$ is $t \left( x \right)$. (and not $t \left( T \left( x \right) \right)$ as is assumed in the fourth formula). The term $\left| \frac{1}{\left( T^{- 1} \right)' T \left( x \right)} \right|$ appears mistakenly in the fourth formula because $$ t \left( T \left( x \right) \right) = \left| t \left( T \left( x \right) \right) \right| = \left| T' \left( T \left( x \right) \right) \right| = \left| \frac{1}{\left( T^{- 1} \right)' T \left( x \right)} \right| $$ (Notice that the derivative of the inverse function is the inverse of the derivative of the original function. Also there is no need for the absolute values here because densities are positive. To quickly check the error notice that $T:(0,\infty)\rightarrow (0,1)$ and $t:(0,\infty)\rightarrow (0,\infty)$). In the last formula, the last term is $t \left( x \right)$ and not the incorrect one $t \left( T \left( x \right) \right)$.