One way is to first expand the right summand in the first equation - i.e. perform the vector-matrix multiplications.
Then modify the result using for example the identity given here:
Inverse of a sum of symmetric matrices.
Further modify using the matrix inversion theorem (here a special case: $U=V=I$) - it's actually shown on the next page of one of the sources given in the question:http://www.gaussianprocess.org/gpml/chapters/RWA.pdf, p.201 1st paragraph)
What one is left with after further rearranging and simplifying is the second equation plus the two left summands of the first equation - so plugging this result into the first equation and performing the subtractions gives the second equation (times -1, which had been ignored here).
My (subjective) opinion:
Do write
$$
\mathcal{N}(x \mid \mu_1,\sigma_1^2)\cdot\mathcal{N}(x \mid \mu_2,\sigma_2^2)
$$
Don't write
$$
\mathcal{N}(\mu_1,\sigma_1^2)\mathcal{N}(\mu_2, \sigma^2)
$$
The notation $\mathcal{N}(x|\mu_1,\sigma_1^2)$ makes it clear that this is a function, this particular function can have a particular significance in a statistical/probability setting question but that does not need to be the case, it is simply a function and the product of functions is a well defined and familiar operation with little ambiguity.
On the other hand the notation $\mathcal{N}(\mu, \sigma^2)$ is to be used as short hand to make $X \sim \mathcal{N}(\mu, \sigma^2)$ equivalent to "the random variable $X$ has a normal distribution with mean $\mu$ and variance $\sigma^2$".
In this setting $\mathcal{N}(\mu, \sigma^2)$ is not a function, it isn't even a distribution function it is a symbolic representation of a particular statement, a sign post that allows us to retrieve certain information should we wish, but it is not a mathematical object for which we have a well defined product.
Best Answer
Perhaps the easiest way to understand convolution, in the context of probability distributions, is in terms of the sum of independent random variables. Suppose that independent random variables $X_1$ and $X_2$ have distributions $d_1$ and $d_2$ respectively. Then $X_1+X_2$ has distribution given by the convolution of $d_1$ and $d_2$. For more details on this, see for example these notes (they only deal with the univariate case but the same concepts apply equally in multivariate situations.)
In the case of two multivariate Gaussians, it is well known (e.g. by considering characteristic functions) that the sum of independent $X\sim\mathcal N(\mu_X,\Sigma_X)$ and $Y\sim\mathcal N(\mu_Y,\Sigma_Y)$ is just $X+Y\sim\mathcal N(\mu_X+\mu_Y,\Sigma_X+\Sigma_Y)$. So all you need to do is add the mean vectors and covariances matrices.
An alternative (more direct but less illuminating) approach can be found in this document.