With the distributions on our random vectors:
$\mathbf x_i | \mathbf \mu \sim N(\mu , \mathbf \Sigma)$
$\mathbf \mu \sim N(\mathbf \mu_0, \mathbf \Sigma_0)$
By Bayes's rule the posterior distribution looks like:
$p(\mu| \{\mathbf x_i\}) \propto p(\mu) \prod_{i=1}^N p(\mathbf x_i | \mu)$
So:
$\ln p(\mu| \{\mathbf x_i\}) = -\frac{1}{2}\sum_{i=1}^N(\mathbf x_i - \mu)'\mathbf \Sigma^{-1}(\mathbf x_i - \mu) -\frac{1}{2}(\mu - \mu_0)'\mathbf \Sigma_0^{-1}(\mu - \mu_0) + const$
$ = -\frac{1}{2} N \mu' \mathbf \Sigma^{-1} \mu + \sum_{i=1}^N \mu' \mathbf \Sigma^{-1} \mathbf x_i -\frac{1}{2} \mu' \mathbf \Sigma_0^{-1} \mu + \mu' \mathbf \Sigma_0^{-1} \mu_0 + const$
$ = -\frac{1}{2} \mu' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) \mu + \mu' (\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i) + const$
$= -\frac{1}{2}(\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i))' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) (\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i)) + const$
Which is the log density of a Gaussian:
$\mu| \{\mathbf x_i\} \sim N((N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i), (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1})$
Using the Woodbury identity on our expression for the covariance matrix:
$(N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1} = \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0$
Which provides the covariance matrix in the form the OP wanted. Using this expression (and its symmetry) further in the expression for the mean we have:
$\mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0 \mathbf \Sigma_0^{-1} \mu_0
+
\frac{1}{N} \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \mathbf \Sigma \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i$
$= \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mu_0
+ \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \sum_{i=1}^N (\frac{1}{N} \mathbf x_i)$
Which is the form required by the OP for the mean.
What @whuber and I were saying to you in the comments are equivalent things. @whuber pointed out that the text you were reading makes points column vectors. I stuck to your own original notation where points are row vectors (this way of presenting is more common). When points are columns, thansposed ("T", or just ' in my notation) multiplier is the right one; when they are rows, it is the left one. Instead of multiplying separate vectors, it's more convenient to multiply whole matrices. See it with your data (matrix A = your "Ck"):
****** Points are rows, variables are columns [more common] ******
A
1 2 3
4 5 6
Column-centered A
-1.500000000 -1.500000000 -1.500000000
1.500000000 1.500000000 1.500000000
A'A, the scatter matrix
4.500000000 4.500000000 4.500000000
4.500000000 4.500000000 4.500000000
4.500000000 4.500000000 4.500000000
****** Points are columns, variables are rows [that's how in your book] ******
A
1 4
2 5
3 6
Row-centered A
-1.500000000 1.500000000
-1.500000000 1.500000000
-1.500000000 1.500000000
AA', the scatter matrix
4.500000000 4.500000000 4.500000000
4.500000000 4.500000000 4.500000000
4.500000000 4.500000000 4.500000000
Best Answer
I think you would be much better off by using matrix algebra throughout.
As for your first question, remember that the fits in a linear model are obtained as:
$$\hat{y} = H y$$
Intuitively, $h_{ii}=1$ means that observation $y_i$ fully determines $\hat{y}_{i}$, so in a sense it has maximum leverage. If $h_{ii} \approx 0$, that would imply that observation $y_i$ has very little role in determining $\hat{y}_{i}$ which would be mostly determined by the rest of the observations.