Solved – Prior for a linear transformation matrix: Matrix Normal Distribution

conjugate-priorlinear algebramodelingnormal distribution

I have been trying to derive some conditional distribution for parameters of a linear transformation (represented as a matrix) and I had a lot of help on this thread yesterday. However, I realised I did something which could be a terrible mistake.

So, I have vector valued observation $y$, which is modelled as a linear combination of another observation $x$.

The way $y$ is modelled is:

$$
y \sim \mathrm{N} (Ax, \Sigma).
$$

The thing to note here is that $A$ is a matrix. Now, I wanted to put a normal prior on the transformation parameters i.e. the entries of $A$ and I do that as:

$$
A \sim \mathrm{N} (A_0, \nabla).
$$

Now, here I see $A$ as a vector. Now, when I tried to get the conditional distribution of $A$ by multiplying the two Gaussians, I ran into a bit of trouble because of this matrix-vector discrepancy. I could not separate the terms properly as was suggested by Glen_b in that thread (because my question had the mistake and I did not realize that then).

I was wondering if there is a way to deal with this so that I can still derive the conditional distribution in a closed form way. Perhaps, what I have done is valid and I need to find some linear algebra tricks to make this work. However, I think that might not be the case.

On a more hopeful note, I see that there is this Matrix Normal Distribution Would it be possible to use this as a prior for the transformation matrix $A$ and would it still be possible to get a closed form solution for the conditional posterior? I am sure this entity is a lot more complex to manipulate but perhaps someone with more expertise can confirm if this is a good road to go down.

Best Answer

If you observe $Ax = (x^\top \otimes I_m) vec(A)$, then you can put a prior over $A$ (matrix-normal prior is on $vec(A)$), so you can do proper Gaussian inference, in closed form. As far as I am aware, you need to put a special structure on prior covariance matrix to make the inference tractable. Regarding posterior updates, you can find exactly what you are looking for, from here. (Sec. II-B, Inference section). Note the special structure on the covariance matrix.