[Math] Trace of symmetric positive semidefinite matrix when diagonalized (as a bilinear form) in a non-orthogonal basis

bilinear-formlinear algebramatricestrace

Let $\mathbf{S}$ be symmetric positive semidefinite matrix (i.e. one with all eigenvalues real and non-negative). Then there is an orthogonal matrix $\mathbf{U}$ (with its columns forming an orthonormal basis) such that $\mathbf{U}^\top \mathbf{S} \mathbf{U}$ is diagonal; this basis is of course given by eigenvectors of $\mathbf{S}$.

Consider another basis $\mathbf{V}$ consisting of unit-length but non-orthogonal vectors (so columns of $\mathbf{V}$ have unit length but are not orthogonal) that also diagonalizes $\mathbf{S}$, i.e. $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ is diagonal.

I suspect that the following is true: $\mathrm{Tr}(\mathbf{V}^\top \mathbf{S} \mathbf{V}) \le \mathrm{Tr}(\mathbf{S})=\mathrm{Tr}(\mathbf{U}^\top \mathbf{S} \mathbf{U})$. Is it true? If so, how can it be proved?

Furthermore, is it true that the equality is reached iff V is orthogonal?

Update: Following some confusion in the comments, I would like to clarify that I am considering $\mathbf{S}$ to represent a bilinear form, not a linear form. So with a change of basis it is transformed as $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ and not as $\mathbf{V}^{-1} \mathbf{S} \mathbf{V}$.


Update 2

Let me illustrate where this question comes from; it might provide some additional intuition. $\mathbf{S}$ is actually a covariance matrix of some data (i.e. I have a set of data points $\mathbf{x}_i \in \mathbb{R}^N$, and $\mathbf{S} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top$, up to a constant factor). Trace of $\mathbf{S}$ is total variance of the data, and it of course stays the same if coordinate system is rotated. Now for any unit vector $\mathbf{v}$, variance of the projection of the data on the axis defined by this vector is equal to $\mathbf{v}^\top\mathbf{S}\mathbf{v}$. If I take $N$ orthogonal unit vectors, then sum of these variances is equal to the total variance. I am interested in the situation when I take $N$ non-orthogonal unit vectors, but they are chosen such that all projections of the data on these vectors have zero correlation (or covariance). This condition is equivalent to $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ being diagonal. This means that my projections are "independent"; therefore I am pretty sure that their variances together cannot exceed total variance; total variance should give maximum amount of variance that can be "distributed" between independent components (with maximum being achieved with principal components).

Best Answer

Denote scalar product of vectors $v,u$ by $(v,u)$, norm of vector $v$ by $\|v\|=\sqrt{(v,v)}$.

Lemma 1. Let $A$ be a symmetric positive operator on $\mathbb{R}^n$, $f\in \mathbb{R}^n$ be a vector. Then $(Af,f)\cdot (A^{-1} f,f)\geq \|f\|^4$.

Proof. Let $A=B^2$, where $B=\sqrt{A}$ is positive. Then $(Af,f)=(B^2f,f)=(Bf,Bf)=\| Bf\|^2$, $(A^{-1}f,f)=\|B^{-1}f\|^2$ and we have to prove $\|Bf\|\cdot \|B^{-1} f\|\geq \|f\|^2$, but, by the Cauchy-Schwarz inequality, $\|Bf\|\cdot \|B^{-1} f\| \geq (Bf, B^{-1} f)=(f,f)=\|f\|^2$, as desired.

Lemma 2. If for some positive symmetric matrix $A$ diagonal elements are equal to 1, then for $A^{-1}$ diagonal elements are not less then 1.

Proof. Apply Lemma 1 to basis vectors.

Now assume that $U:=V^TSV={\rm diag}(c_1,\dots,c_n)$. Then ${\rm tr}\, V^T SV=\sum c_i$, $$ {\rm tr}\, S={\rm tr}\, V^TS(V^T)^{-1}={\rm tr}\, V^TSV (V^T V)^{-1}={\rm tr}\, U F^{-1}, $$ where $F=V^TV$ is a symmetric positive matrix with unit diagonal elements. So, by Lemma 2 diagonal elements $w_1,\dots,w_n$ of $F^{-1}$ are not less than $1$ and so ${\rm tr}\, S=\sum c_i w_i\geq \sum c_i={\rm tr}\, U$.

Related Question