I believe $I+N$ must indeed be "positive definite".
Apply induction on the dimension. We are given a nilpotent matrix $N'$ which by Schur decomposition we can assume is strictly upper triangular. As a block matrix, $N'=\begin{pmatrix}N&x\\0&0\end{pmatrix}$ with $N$ a square strictly upper triangular matrix and $x$ a column vector. By assumption $(I+N')^2+(I+N')^{2t}$ is SPD (SPD=symmetric positive definite; superscript $2t$ means square and transpose). We wish to show that $(I+N')+(I+N')^t$ is SPD.
Let $U=I+N$ and note
$$\begin{pmatrix}U&x\\0&1\end{pmatrix}^2+\begin{pmatrix}U&x\\0&1\end{pmatrix}^{2t}=\begin{pmatrix}U^2+U^{2t}&(I+U)x\\x^t(I+U)^t&2\end{pmatrix}.$$
By the Schur decomposition characterization of SPD we have by assumption:
- $U^2+U^{2t}$ is SPD
- $2-x^t(I+U)^t(U^2+U^{2t})^{-1}(I+U)x>0$
And we need to show:
- $U+U^t$ is SPD
- $2-x^t(U+U^t)^{-1}x>0.$
$U+U^t$ is SPD by induction. The latter inequality would follow from
$$x^t(I+U)^t(U^2+U^{2t})^{-1}(I+U)x\geq x^t(U+U^t)^{-1}x$$
Letting $y=(I+U)x,$ this is
$$y^t(U^2+U^{2t})^{-1}y\geq y^t(I+U^t)^{-1}(U+U^t)^{-1}(I+U)^{-1}y$$
This would follow from this result and symmetric positive definiteness of
$$(I+U)(U+U^t)(I+U^t)-(U^2+U^{2t})=U+U^t+2UU^t+U(U+U^t)U^t$$
Since $U+U^t$ is SPD, so is $U(U+U^t)U^t,$ and $UU^t$ is certainly SPD. So we are done.
See loup blanc's answer for a generalization of the above argument from unipotent matrices to all matrices with positive spectrum.
Here is a reference with a nice proof: Uniqueness of matrix square roots and an application by Charles R. Johnson, Kazuyoshi Okubo, Robert Reams, Theorem 7. It uses the following theorem:
Theorem (Lyapunov). Let $A\in\mathbb C^{n\times n}$ (not necessarily Hermitian) and let $X\in\mathbb C^{n\times n}$ be Hermitian. If the eigenvalues of $A$ all have positive real part and $AX+XA^*$ is positive definite, then $X$ is positive definite.
In particular if $A$ has eigenvalues with positive real part and $A(A+A^*)+(A+A^*)A^*=A^2+(A^*)^2+2AA^*$ is positive definite, then $A+A^*$ is positive definite.
Proof 1 (sketch) following Horn and Johnson's Topics in Matrix Analysis:
Suppose not. We can take a kind of Jordan normal form, but making any above-diagonal $1$'s arbitrarily small, so $S^{-1}AS+(S^{-1}AS)^*$ is positive definite for some non-singular $S.$ Setting $G=SS^*$ we find that $AG+GA^*$ is positive definite. For $0\leq \theta\leq 1$ define $X_{\theta}=\theta G+(1-\theta)X.$ Note:
- $X_0=X$ is not positive definite
- $X_1=G$ is positive definite
- $AX_\theta+X_\theta A^*$ is a convex combination of positive definite matrices so must be positive definite.
All the matrices $X_\theta$ have real eigenvalues, and by continuity some $X_\theta$ must have $0$ as an eigenvalue: $X_\theta v=0$ for some non-zero $v.$ This implies $v^*(AX_\theta+X_\theta A^*)v=0,$ contradicting positive definiteness of $AX_\theta+X_\theta A^*.$
Proof 2.
Consider the function defined by $f(W)=\int_0^\infty e^{-tA}We^{-tA^*}dt.$ We can compute
\begin{align*}
W&=-\frac d{d\tau}\Bigr|_{\tau=0}\int_{\tau}^\infty e^{-tA}We^{-tA^*}dt\\
&=-\frac{d}{d\tau}\Bigr|_{\tau=0}\int_0^\infty e^{-\tau A}e^{-tA}We^{-tA^*}e^{-\tau A^*}dt\\
&=Af(W)+f(W)A^*.
\end{align*}
This means $f$ is a left inverse of the map $X\mapsto AX+XA^*.$ Since $f$ is an $\mathbb R$-linear map from the space of Hermitian matrices to itself, and has a left inverse, it must be an isomorphism. So the solution $X$ of $AX+XA^*=W$ is unique. And if $W$ is positive definite, then so is $e^{-tA}W^{1/2}W^{1/2}e^{-tA^*},$ so $f(W)$ is positive definite by construction. This proves that if $AX+XA^*$ is positive definite then so is $X.$
It is generally true that if $A$ is an $n\times n$ invertible and if $A^{-1}$ has a "square root" $C$, also $n\times n$, such that:
$$ A^{-1} = C^2 $$
then $C^{-1} A^{-1} C^{-1} = I$ holds.
The first fact we need is that since $A$ is invertible, $A^{-1}$ is invertible, and this implies $C$ is invertible. For if not, then there would exist a nonzero vector $x$ in the nullspace of $C$, and $Cx=0$ would imply $A^{-1}x=0$, contradicting the invertibility of $A^{-1}$. Thus $A = (C^2)^{-1} = (C^{-1})^2$.
The second fact we need is that a one-sided inverse of a matrix is a two-sided inverse, so that:
$$ A C^2 = I \; \implies \; C A C = I $$
That is, using associativity of matrix multiplication, the left hand side tells us $(AC)C = I$, so that $C$ is a (right) inverse of $AC$. Thus it must also be a (left) inverse of $AC$, which is what the right hand equation states.
Finally by the same reasoning:
$$ C A C = I \; \implies \; C^{-1} A^{-1} C^{-1} = I $$
In this discussion/proof we have not invoked the symmetry of $A$ nor the uniqueness of a symmetric positive definite square root $C$ for $A^{-1}$, which also would be symmetric positive definite. The reasoning above is correct even if $A$ is not symmetric, and even if $C$ is not positive definite, and relies only on $A^{-1} = C^2$.
Best Answer
The Cholesky factorization $A=R^{T}R$ where $R$ is upper triangular with positive entries on the diagonal can often be used as an effective "square root" of a symmetric and positive definite matrix $A$. It's relatively fast to compute.
If $A$ is symmetric and positive definite, it will also have a symmetric and positive definite matrix square root $S$, with $A=S^{2}$. This can be computed from the eigenvalue decomposition of $A$ as explained in Jacky Chong's answer, or by some special purpose algorithms. The symmetric matrix square root is quite a bit slower to compute than the Cholesky factorization.
It is unfortunate that some authors choose to refer to the Cholesky factorization as a "square root", but this is fairly common. It would be better if these authors used the term Cholesky factorization instead.