Lipschitz continuity of matrix sqrt

inverselipschitz-functionsmatrices

This is a special case of the question here: Lipschitz continuity of matrix product, but many of the details are unclear to me.

Let $A \in \mathbb{R}^{a \times b}$ and $B \in \mathbb{R}^{a \times c}$ be fixed and have finite norm (e.g., spectral norm). Then, I am interested in whether the function $$f(X) = [(A – BX)'(A – BX)]^{-1/2}$$ is Lipschitz over the set of $X$ such that the minimum and maximum singular values of $(A – BX)$ are bounded below and above by a known positive constants (say, $k$ and $k^{-1}$ for $k > 0$).

If the function is Lipschitz over this set, how would I prove this?

Best Answer

If we can prove $f$ is differentiable with bounded derivative $Df$, Lipschitz continuity follows from the fundamental theorem of calculus. The Lipschitz constant then equals the maximum of $\lVert Df \rVert$.

Denote the space of $b\times b$-symmetric matrices by $\textrm{Sym}_b$. Define \begin{align*} &f_1: \mathbb{R}^{c\times b} \rightarrow \mathrm{Sym}_b &f_1(X) = (A+BX)^T(A+BX) \\ &f_2: \textrm{Sym}_b \rightarrow \textrm{Sym}_b &f_2(X) = X^\frac{1}{2} \\ &f_3: GL(b,\mathbb{R}) \rightarrow GL(b,\mathbb{R}) &f_3(X) = X^{-1}. \end{align*} As long as the singular values of $A+BX$ stay away from zero, $f = f_3\circ f_2 \circ f_1$ is defined, and $Df(X) = Df_3\left(f_2(f_1(X))\right)\cdot Df_2(f_1(X))\cdot Df_1(X)$ by the chain rule.

First, $Df_1(X)Y = Y^TB^T(A+BX) + (A+BX)^TBY$, hence $\lVert Df_1(X) \rVert_2 \leq \lVert B^T(A+BX)\rVert_2 + \lVert (A+BX)^TB \rVert_2$ by the triangle inequality. Now, since the largest singular value of $A+BX$ is bounded by $r$, $\lVert A+BX \rVert_2 \leq r$ and thus $\lVert Df_1(X) \rVert_2 \leq 2r \lVert B \rVert_2$.

The square root we can differentiate via the inverse function theorem. Indeed, let $g: \textrm{Sym}_b \rightarrow \textrm{Sym}_b$, $g(X) = X^2$, then $Id_b = D(g\circ f_2) = (Dg\circ f_2) \cdot Df_2$, hence $Df_2 = Id_b / (Dg \circ f_2)$. By the product rule, $Dg(X)Y = XY+YX$. Fortunately, we can diagonalize this map: Let $u_1,\dots,u_b$ be an orthonormal base of eigenvectors of $X$ with eigenvalues $\mu_1,\dots,\mu_b$. Then $Y_{jk} = \frac{1}{2}\left(u_ju_k^T + u_ku_j^T\right)$, $1\leq j\leq k \leq b$, gives a basis of $\textrm{Sym}_b$, which is orthonormal w.r.t. the Frobenius scalar product $\left< A,B\right> = \mathrm{tr}(AB)$ on $\mathrm{Sym}_b$ and it's induced norm $\lVert \cdot \rVert_F$. Since $$XY_{jk} + Y_{jk}X = \frac{1}{2}\left( Xu_j u_k^T + Xu_k u_j^T + u_j u_k^T X + u_k u_j^T X \right) = \frac{1}{2}\left( \mu_j u_j u_k^T + \mu_k u_k u_j^T + \mu_k u_j u_k^T + \mu_j u_k u_j^T \right) = (\mu_j+\mu_k)Y_{jk},$$ the eigenvalues of $Df_2(X) = Id_b/Dg(f_2(X))$ are given by $(\lambda_j+\lambda_k)^{-1}$, where $\lambda_1 \dots \lambda_b$ are the eigenvalues of $f_2(X)$. Since $Df_2$ is additionally symmetric w.r.t. the Frobenius scalar product, we infer $\lVert Df_2(X) \rVert_F \leq 2\frac{1}{|\xi|}$, where $\xi$ is the eigenvalue of $f_2(X)$ with the smallest absolute value. In our case, $\xi$ is the smallest singular value of $A+BX$, which we assumed to be greater than $r^{-1}$. Since all norms on finite dimensional vector spaces are equivalent, $\lVert Df_2(f_1(X))\rVert_2 \leq Cr $ for a dimension-dependent constant $C$.

Finally, it's very well known that $Df_3(X)Y = X^{-1}YX^{-1}$, hence $\lVert Df_3 \rVert_2 \leq \lVert X^{-1} \rVert_2^2$. This implies $\lVert Df_3(f_2(f_1(X))) \rVert_2 \leq r^2$.

Altogether, we find $\lVert Df(X) \rVert_2 \leq \lVert Df_3(f_2(f_1(X))) \rVert_2 \lVert Df_2(f_1(X)) \rVert_2 \lVert Df_1(X) \rVert_2 \leq 2Cr^4\lVert B \rVert_2$, yielding the desired Lipschitz constant.

Related Question