[Math] Seeking proof for linear algebra constraint problem.

algorithmslinear algebrana.numerical-analysisoc.optimization-and-control

Given a symmetric real matrix with a zero diagonal $M$, I am trying to find a diagonal matrix $D$, such that the matrix $M + D$ is positive definite, and $(M+D)^{-1}$ has a diagonal consisting of all 1's. This problem looks vaguely like a semidefinite programming problem, except that both the matrix $(M+D)$ and it's inverse have linear constraints. Overall, the system has as many constraints as variables.

Based on small scale numerical testing, it strongly appears that there is always a unique solution. I've implemented an algorithm (which is $O(n^6)$, $n$ being the size of the matrix), that works by constructing a second matrix $X$, and minimizing $||(M+D) X – I||$ with respect to $D$ and then with respect to $X$ in an alternating fashion. Note that I am using the Frobenius norm here. Given the proper initialization, such the $(M+D)$ is positive definite, this usually appears to converge, and each step is simply a quadratic minimization.

That said, I have no proof that there is a unique solution for $D$, or that the algorithm above works in the general case, and moreover I have a strong intuition that there is an algorithm with is closer to $O(n^3)$.

What I seeks is proof or a theoretical justification that the solution to $D$
is unique (or of course a counterexample). Even better would be a provably polynomial time algorithm to find $D$.

My approach for a proof up till this point has been along the lines of finding some error function $f(X)$, $X = M + D$, such as $f(X) = \sum_i ((X^{-1})_{ii} – 1)^2$. This function (and lots of other variants) have a minimum at the desired solution. My hope was to then show that the function is convex over all positive definite matrices $X$. However I have not been able to accomplish this so far.

Edit: For the $f(X)$ given above I have found a number of counterexamples to it's convexity, although perhaps the overall method is still salvageable with a different error function.

Edit: Some additional facts I've been able to show (in part with help from the comments)

The set of all positive definite matrices $X = M + D$ is clearly a convex set (since it is the intersection of two convex sets, the positive definite matrices and the set of all matrices with non-diagonal elements $M$).

Moreover, the set of all $X$ above such that all the diagonals elements of $X^{-1}_{ii} \le 1$ is also a convex set. This follows from the convexity of $e^T X^{-1} e$, and the statement above. The solution in question is clearly on the boundary of this set.

Best Answer

(Edit: my original answer was perhaps not clear enough, let me try to improve it).

First some notation: for a matrix $x$, let me denote by $E(x)$ the diagonal matrix with the same diagonal as $x$: if $x=(x_{i,j})_{i,j\leq n}$, $E(x) = (x_{i,j}\delta_{i,j})_{i,j \leq n}$. Equivalently, $E$ is the orthogonal projection on the diagonal matrices when you consider the usual euclidean structure on $M_n$. If will also write $x>0$ to mean $x$ is symmetric positive definite.

You are asking whether the map $f:x \mapsto x^{-1} - E(x^{-1})$ is a bijection from its domain $D=\{x \in M_n(\mathbb R), x>0\textrm{ and }E(x)=1\}$ to its image $I=\{x \in M_n, x=x^*\textrm{ and }E(x)=0\}$. And the answer is yes. I prove first that $f$ is injective, and then that it is surjective.


f is injective

In fact let me prove the following fact, which is equivalent to the injectivity of $f$.

Let $x$ be a positive matrix. If $d$ is a diagonal matrix such that $x+d$ is positive and such that $x^{-1}$ and $(x+d)^{-1}$ have the same diagonals, then $d=0$.

Proof: Since $d$ is diagonal, the trace of $d\left(x^{-1} - (x+d)^{-1}\right)$ is zero. But one can write this expression as \[dx^{-1/2}\left(1- (1+x^{-1/2}dx^{-1/2})^{-1}\right)x^{-1/2},\] so that taking the trace and denoting by $a=x^{-1/2}dx^{-1/2}$, we get $0= Tr(a(1-(1+a)^{-1}))$.

If $\lambda_1,\dots,\lambda_n$ are the eigenvalues of $a$, the condition $x+d$ positive becomes $1+\lambda_i >0$, and the last equality becomes $0 = \sum \lambda_i(1-1/(1-\lambda_i)) = \sum \lambda_i^2/(1+\lambda_i)$, which is possible only if the $\lambda_i$ are all zero, i.e. $a=0$, i.e. $d=0$.


** f is surjective **

The surjectivity is true just for topological reasons. More precisely, to prove that $f$ is surjective, it is enough to prove that it is continuous, open and proper (because this would imply that the image is an open and closed subset of $I$, and hence everything since $I$ is connected). The continuity is obvious. $f$ is even differentiable, and the differential is explicitely computable and easily seen to be invertible at every point, so that $f$ is indeed open. It remains to check that it is proper.

The proof I have is not completely obvious, maybe I am missing something. Let me only sketch it. Take a sequence $x_k \in D$ that escapes every compact subset of $D$. Since $\|x\|\leq n$ for all $x \in D$, we have that $u_k=\|x_k^{-1}\|\to \infty$ (I consider the operator norm, and the inequality $\|x\|\leq n= Tr(x)$ is because the norm of $x>0$ is its largest eigenvalue, whereas its trace is the sum of its eigenvalues). We want to prove that $\|f(x_k)\|\to \infty$. Assume for contradiction that this is not the case, and that $\|f(x_k)\|\leq C$ for all $k$. We will get a contradiction through a careful study of the spectral decomposition of $x_k$.

Let $\xi_k$ be a sequence of unit eigenvectors of $x_k$ relative to the smallest eigenvalue of $x_k$, i.e. $x_k \xi_k = 1/u_k \xi_k$. Now the key observation: the assumption that $\|f(x_k)\|\leq C$ implies that, for all diagonal matrix $d$ with $1$ or $-1$ on the diagonal, the distance from $d \xi_k$ to the space $F_k$ spanned by the eigenvectors of $x_k^{-1}$ relative to the eigenvalues in an interval $[u_k/2,u_k]$ goes to zero. For a proof, consider the random diagonal matrix $d$ in which the diagonal entries are iid random variables uniform in $\{-1,1\}$, so that $E(x) = \mathbb E (d x d)$ (hoping there will be no confusion between $E$ and $\mathbb E$). Then $\langle f(x_k) \xi_k,\xi_k\rangle = \mathbb E ( u_k - \langle x_k d \xi_k, d \xi_k\rangle)$. The lhs of this equality is by assumption smaller than $C$. On the rhs, $u_k - \langle x_k d \xi_k, d \xi_k\rangle \geq 0$ because $d \xi_k$ is a unit vector. This implies that $u_k - \langle x_k d \xi_k, d \xi_k\rangle \leq 2^n C$ for any diagonal matrix with $\pm 1$ on the diagonal. But now use the fact that, for $x>0$ in $M_n$, if a unit vector $\xi$ in $\mathbb R^n$ satisfies $\langle x \xi,\xi\rangle \geq \|x\|-\delta$, then $\xi$ is at distance less than $\sqrt{2\delta/\|x\|}$ from the space spanned by the eigenvectors of $x$ relative to eigenvalues in the interval $[\|x\|/2,\|x\|]$ (hint for a proof: consider the decompostion of $\xi$ in an orthonormal basis of eigenvectors of $x$). Here if $\epsilon_k = \sqrt{2^{n+1} C/ u_k}$, we have indeed proved that $d \xi_k$ is at distance less than $\epsilon_k$ from $E_k$ for any diagonal matrix with $\pm 1$ on the diagonal.

I now claim that there is a vector $\eta_k$ in the canonical basis of $\mathbb R^n$ at distance less than $\sqrt n \epsilon_k$ from $E_k$. This will conclude the proof since it will in particular imply that $\langle x_k \eta_k,\eta_k\rangle \to 0$, whereas the assumption $E(x_k)=1$ implies that $\langle x_k \eta_k,\eta_k\rangle = 1$, a contradiction. To prove the claim, let $i$ be such that the $i$-th coordinate of $\xi_k$ is larger than $1/\sqrt n$ in absolute value. Observe that $\xi_k(i) e_i$ is the expected value of $d \xi$, where $d$ is the same random matrix as above, but conditionned to $d_i = 1$. This implies that $\xi_k(i) e_i$ is at distance at most $\epsilon_k$ from $E_k$, which proves the claim.

A remark I do not like this proof, since it really relies on finite-dimensional techniques. In particular, it does not extend to general von Neumann algebras (whereas the injectivity part does). I would prefer a more direct proof.

Related Question