[Math] matrix derivative relation to vec operator and kronecker product

linear algebramatricesmatrix-calculus

I'm trying to understand how to take a matrix derivative and its connection with the kronecker product / vec operator.

Setup

$L \in \mathbb{R}^{d \times k}$, $\theta \in \mathbb{R}^{d \times 1}$, $s^{(t)} \in \mathbb{R}^{k \times 1}$, $D=\mathbb{R}^{d \times d}$, $\lambda \in \mathbb{R}$.

Optimization Problem
$$\min_{L} \lambda||L||_{F}^{2} + (\theta-Ls)^{T}D(\theta-Ls) $$

To solve this, I know I need to null the gradient with respect to $L$ and solve for $L$. In other words, I solve for $$L$$ in the following equation
$$ 0 = \frac{\partial}{\partial L} \lambda||L||_{F}^{2} + \frac{\partial}{\partial L} (\theta-Ls)^{T}D(\theta-Ls)$$
$$ = 2\lambda L + \frac{\partial}{\partial L} (\theta-Ls)^{T}D(\theta-Ls)$$

The problem is I don't quite understand how to evaluate the 2nd term in the summation above.

The solution is given to be
$$ L = A^{-1}b$$ where
$$ A = \lambda (I_{d\times k, d\times k}) ss^{T} \otimes D$$
$$ b = \text{vec}(s^{T} \otimes \theta^{T}D) $$

Confusion
Could someone kindly explain how to arrive at these solutions? I think my confusion stems from a general discomfort with a relationship between kronecker products, vec operator and a matrix derivative. I'd really appreciate some intuition as to how those pieces fit together. In addition, I'm hoping someone can point me to some resources where I can learn more about these ideas.

Best Answer

Define the vector $y=(Ls-\theta)$, then using this vector and the Frobenius product, rewrite the function and find its differential $$\eqalign{ f &= D:yy^T + \lambda L:L \cr df &= D:2\,{\rm sym}(dy\,y^T) + 2\,\lambda L:dL \cr &= 2\,D:dy\,y^T + 2\,\lambda L:dL \cr &= 2\,Dy:dy + 2\,\lambda L:dL \cr &= 2\,Dy:dL\,s + 2\,\lambda L:dL \cr &= 2\,(Dys^T + \lambda L):dL \cr }$$ Since $df=\frac{\partial f}{\partial L}:dL,\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial L} &= 2\,(Dys^T + \lambda L) \cr }$$ Setting the gradient to zero yields $$\eqalign{ \lambda L &= D(\theta-Ls)s^T \cr D\theta s^T &= \lambda L+DLss^T \cr }$$ At this point, we can't solve for $L$ because it's sandwiched between 2 matrices. So we use a Kronecker-vec trick to solve for $L$ in vector form.

Applying vec to both sides of the above equation yields $$\eqalign{ (\lambda I + ss^T\otimes D)\,{\rm vec}(L) &= {\rm vec}(D\theta s^T)\cr }$$ This is a normal matrix-vector equation which can be solved for ${\rm vec}(L)$. The matrix $L$ can be recovered by un-stacking the rows of the solution vector.