Derivative of an implicit matrix function

derivativeslinear algebramatricesmatrix-calculus

Let $\mathbf{V}$ be an $N \times N$ real symmetric (or complex Hermitian) positive definite matrix, such that $\mathrm{det}(\mathbf{V})=1$.

By means of the implicit function theorem, its first top-left entry $[V]_{1,1} \triangleq v_{11} $ can be expressed as:

  • Real symmetric (positive definite) $\mathbf{V}$:
    \begin{equation*}
    [V]_{1,1} \triangleq v_{11} = g_\mathbb{R}(v_{1,2},\ldots,v_{N,N}),
    \end{equation*}

    where $v_{1,2},\ldots,v_{N,N}$ are the $N \times (N+1)/2-1$ entries of the upper triangular submatrix except $v_{11}$,
  • Complex Hermitian (positive definite) $\mathbf{V}$:
    \begin{equation*}
    [V]_{1,1} \triangleq v_{11} = g_\mathbb{C}(v_{1,2},v_{2,1}\ldots,v_{N,N}),
    \end{equation*}

    where $v_{1,2},v_{2,1},\ldots,v_{N,N}$ are the $N^2 – 1$ entries of $\mathbf{V}$ except $v_{11}$,

and $g_{\mathbb{R}}$, $g_{\mathbb{C}}$ are differentiable functions (but I haven't its explicit expression).

How can I explicitly evaluate the following (column) vectors ?
\begin{equation*}
\mathbf{s}_{\mathbb{C}} = \frac{\partial g([\mathbf{V}]_{2,1},\ldots,[\mathbf{V}]_{N,N})}{\partial \underline{\mathrm{vec}}(\mathbf{V})}
\end{equation*}

for $\mathbf{V} \in \mathbb{C}^{N \times N}$ (Hermitian positive definite) and where $\mathrm{vec}(\mathbf{V}) \triangleq [v_{11},\underline{\mathrm{vec}}(\mathbf{V})^T]^T$ and

\begin{equation*}
\mathbf{s}_{\mathbb{R}} = \frac{\partial g([\mathbf{V}]_{2,1},\ldots,[\mathbf{V}]_{N,N})}{\partial \underline{\mathrm{vech}}(\mathbf{V})}
\end{equation*}

for $\mathbf{V} \in \mathbb{R}^{N \times N}$ (symmetric positive definite) and where $\mathrm{vech}(\mathbf{V}) \triangleq [v_{11},\underline{\mathrm{vech}}(\mathbf{V})^T]^T$ and $\mathrm{vech}$ is the (column-wise) vectorization of the upper triangular part of $\mathbf{V}$.

Thanks!

Best Answer

$ \def\bbR#1{{\mathbb R}^{#1}} \def\d{\lambda}\def\g{\gamma} \def\u#1{\underline{#1}} \def\e{\varepsilon}\def\o{{\tt1}}\def\p{\partial} \def\E{E_{\o\o}} \def\M{M_{\o\o}} \def\W{W_{\o\o}} \def\V{V_{\o\o}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)}\def\BR#1{\Big(#1\Big)} \def\vec#1{\operatorname{vec}\LR{#1}} \def\vech#1{\operatorname{vech}\LR{#1}} \def\adj#1{\operatorname{adj}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\mc#1{\left[\begin{array}{c}#1\end{array}\right]} \def\m#1{\left[\begin{array}{r|rrr}#1\end{array}\right]} $Let $E_{ij}\in\bbR{N\times N}$ denote a standard basis matrix with the $(i,j)^{th}$ element equal to $\o$ and all others equal to $0$. Similarly use $\e_k\in\bbR{N^2\times\o}$ to denote a standard basis vector whose $k^{th}$ element equals $\o$.

Let's also use a colon to denote the Frobenius product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^n\sum_{j=1}^m A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=N^2,\,m=\o)$ it reduces to the standard dot product.
When applied to square matrices $(n=N,\,m=N)$ the trace definition is convenient.

The function in question can be rewritten in a form which is easily differentiated. $$\eqalign{ g &= \E:V \\ &= \vec{\E}:\vec{V} \\ &= {\e_{\o}}:v \\ dg &= {\e_{\o}}:dv \\ \grad{g}{v} &= {\e_{\o}} \\ }$$ In the symmetric case, let $$\eqalign{ w &= \vech{V}, \quad v &= \vec{V} &= Dw \\ }$$ where $D$ is the Duplication matrix

Then by a similar calculation $$\eqalign{ g &= {\e_{\o}}:v \\ &= {\e_{\o}}:Dw \\ &= D^T{\e_{\o}}:w \\ dg &= D^T{\e_{\o}}:dw \\ \grad{g}{w} &= D^T{\e_{\o}} \qquad\qquad\qquad \\ }$$ which is just the first basis vector for the half-vec space $\bbR{N(N+1)/2}$

Update #1

The comments pointed out that underlined quantities explicitly exclude the $\V$ element, e.g. $$\u{v}=\u{\rm vec}(V)$$ For these underlined vec/vech operators, both of the derivatives are zero vectors of dimensions $\,\bbR{N^2-\o}\,$ and $\,\bbR{N(N+\o)/2-\o}\,$ respectively.

Update #2

Based on the comments, the intent of the question is to consider $\V$ as an implicit function of the remaining elements of $V$ based on the constraint $\,\det(V)=\o.\,$

The first step is to reverse the vectorizations and reconstitute the matrix. Since $(\o,\o)$ element is not part of the vectors, the respective unvec operators will set it to zero in the reconstituted matrix, i.e.
$$\eqalign{ &M = \u{\rm unvec}(\u{v}) = \u{\rm unvech}(\u{w}) \\ &\M = 0 \\ }$$ Calculate the inverse and determinant of this reconstituted matrix. $$\eqalign{ W &= M^{-1} \\ \d &= \det(M) \qiq \g=\det(W)=\d^{-1} \\ }$$ The remaining task is to calculate $\M$ such that the constraint $\d=\o$ is satisfied.

The Jacobi formula tells us that $$\eqalign{ \d &= \det(M) \qiq d\d = \d W^T:dM \\ }$$ If the change $dM$ is restricted to its $(\o,\o)$ element then we are left with the scalar equation $$\eqalign{ d\d &= \d \W\;d\M \\ }$$ The increment $d\d=(\o-\d)\,$ yields $\,(\d+d\d)=\o,\,$ which satisfies the constraint.

Therefore incrementing the $\M$ element by $$\eqalign{ d\M &= \frac{\o-\d}{\d\W} \qiq V = M + \E\,d\M \\ }$$ will satisfy the constraint, assuming that neither $\W$ nor $\d$ is equal to zero. For the proposed ${V},\,$ applying the Matrix Determinant Lemma will verify that indeed $\,\det({V})=\o$.

Now we can write an explicit expression for the $g$-function $$\eqalign{ g = \V = \LR{\frac{\o-\d}{\d\W}} = \LR{\frac{\det(W)-\o}{\E:W}} \\ }$$ which we can differentiate $$\eqalign{ dg &= \LR{\frac{d\det(W)}{\E:W}} - \LR{\frac{\det(W)-\o}{(\E:W)^2}}\BR{\E:dW} \\ &= \LR{\frac{\g M:dW}{\E:W}} - \LR{\frac{\g-\o}{(\E:W)^2}}\BR{\E:dW} \\ &= \LR{\frac{(\E:W)\g M-(\g-\o)\E}{(\E:W)^2}}:dW \\ &= \LR{\frac{(\g-\o)\E-\g\W M}{\W^2}}:W\,dM\,W \\ &= W\LR{\frac{(\g-\o)\E-\g\W M}{\W^2}}W:dM \\ &= \LR{\frac{(\g-\o)\LR{W\E W}-\g\W W}{\W^2}}:dM \\ &=\u{\rm vec}\LR{\frac{(\g-\o)\LR{W\E W}-\g\W W}{\W^2}}:d\u{v} \\ \grad{g}{\u{v}} &=\u{\rm vec}\LR{\frac{(\g-\o)\LR{W\E W}-\g\W W}{\W^2}} \\ }$$ The Duplication matrix immediately yields the other derivative $$\eqalign{ \grad{g}{\u{w}} &= \u{D}^T \LR{\grad{g}{\u{v}}} \qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad \\ \\ }$$


NB: The following matrix removes the first element of an $n$-vector $$\eqalign{ R(n) = \m{ 0_{n-1} & I_{n-1}}, \quad {\rm e.g.}\;\; R(4) = \m{ 0 & \o & 0 & 0 \\ 0 & 0 & \o & 0 \\ 0 & 0 & 0 & \o \\ } \in \bbR{3\times 4} \\ }$$ For typing convenience, let $R=R\BR{N^2}$ and $Q=R\BR{N(N+\o)/2}$

This allows all of the underlined symbols to be expressed in standard matrix notation $$\eqalign{ \u{w} &= \u{\rm vech}(V) &= Q\;\vech{V} &= Qw \\ \u{v} &= \u{\rm vec}(V) &= R\;\vec{V} &= Rv \\ \u{D} &= RDQ^T \\ }$$ $$\eqalign{ M &= \u{\rm unvech}(\u{w}) &= {\rm unvech}(Q^T\u{w})\quad \\ &= \u{\rm unvec}(\u{v}) &= {\rm unvec}(R^T\u{v}) \\ }$$

Update #3

Exploiting the block structure of the matrix to evaluate the determinant yields an alternate expression for the $g$-function. $$\eqalign{ V &= \mc{\V & x^T \\ x&Y} \;=\; V^T \\ \det(V) &= \LR{\V-x^TY^{-1}x} \det(Y) \;&\doteq\; \large\o \\ \V &= x^TY^{-1}x + \det(Y^{-1}) \;&\doteq\; g \\ }$$ If you differentiate this expression you will find $$\eqalign{ dg &= 2Y^{-1}x:dx - Y^{-1}\LR{xx^T+\frac I{\det Y}}Y^{-1}:dY \\\\ &= \mc{ 2Y^{-1}x \\ -D^T\LR{Y\otimes Y}^{-1}\vec{xx^T+\frac I{\det Y}} \\ }:d\u{v} \\\\ \grad{g}{\u{v}} &= \mc{ 2Y^{-1}x \\ -D^T\LR{Y\otimes Y}^{-1}\vec{xx^T+\frac I{\det Y}} \\ } \\\\ }$$ Note that $$\eqalign{ {\rm vech}(V) = \mc{\V \\ x \\ {\rm vech}(Y)} = \mc{\V \\ x \\ y} ,\qquad \u{v} = \u{\rm vech}(V) = \mc{x \\ y} \\ }$$