A direct computation is also fine:
$$(PP^T)_{ij} = \sum_{k=1}^n P_{ik} P^T_{kj} = \sum_{k=1}^n P_{ik} P_{jk}$$
but $P_{ik}$ is usually 0, and so $P_{ik} P_{jk}$ is usually 0. The only time $P_{ik}$ is nonzero is when it is 1, but then there are no other $i' \neq i$ such that $P_{i'k}$ is nonzero ($i$ is the only row with a 1 in column $k$). In other words,
$$\sum_{k=1}^n P_{ik} P_{jk} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{otherwise} \end{cases}$$
and this is exactly the formula for the entries of the identity matrix, so
$$PP^T = I$$
If $0 \le A \le I$ (which you can scale to get for a bounded $A$), then there are classical proofs out there to show that the following algorithm converges in the strong operator topology to $B$ such that $0 \le B$ and $B^{2}=A$:
$$
B_{0}=0,\;\; B_{n+1}=B_{n}+\frac{1}{2}(A-B_{n}^{2}).
$$
There are some basic facts you need in the proof, and these are not straightforward. First if $A$ and $B$ are bounded, commuting, selfadjoint operators, then $0 \le A \le B$ implies $A^{2} \le B^{2}$--this inequality is important in order to make sure that $\{ B_{n}\}$ is an operator monotone sequence, a proof which reduces to showing that the following is positive:
$$
B_{n+2}-B_{n+1}=\frac{1}{2}(I-B_{n})^{2}-\frac{1}{2}(I-B_{n+1})^{2}.
$$
Then, you need to know that, because $0 \le B_{1} \le B_{2} \le B_{3}\le \cdots \le I$, the limit $\lim_{n}B_{n}x=x$ exists for all $x \in X$. You end up with a unique $0 \le B \le I$ such that $B^{2}=A$, and such that $B$ commutes with every bounded operator which commutes with $A$.
Convergence: I'm assuming the underlying Hilbert space is complex. The argument for convergence is not terribly long; so I'll give you that part. The only basic proof I know for the fact that $0 \le A \le B \implies A^{2} \le B^{2}$ for commuting $A$, $B$ is too long.
Once you have $0 \le B_{1} \le B_{3} \le B_{3} \le \cdots \le I$, then $\lim_{n}(B_{n}x,x)$ exists for all $x$ because $\{ (B_{n}x,x)\}_{n=1}^{\infty}$ is a bounded, non-decreasing sequence of real numbers for each fixed $x$. By the Polarization Identity,
$b(x,y)=\lim_{n}(B_{n}x,y)$ also exists for all $x,y$. $b$ is sesquilinear and symmetric with $0 \le b(x,x) =\lim_{n}(B_{n}x,x) \le \|x\|^{2}$. So $|b(x,y)|\le \|x\|\|y\|$ also holds.
By the Lax-Milgram theorem there is a unique $0 \le B=B^{\star} \le I$ such that $b(x,y)=(Bx,y)$. And, $0 \le B_{n} \le B \le I$ for all $n$. Because $(x,y)_{n}=((B-B_{n})x,y)$ has the properties of an inner-product, except possibly for positive definiteness, then the Cauchy-Schwarz inequality holds:
$$
|(x,y)_{n}|^{2} \le (x,x)_{n}(y,y)_{n} \le (x,x)_{n}(y,y)
$$
Now, let $y=(B-B_{n})x$ to obtain
$$
\|(B-B_{n})x\|^{2} \le ((B-B_{n})x,x).
$$
Therefore $\lim_{n}\|(B-B_{n})x\|=0$ for all $x$ because $\lim_{n}((B-B_{x})x,x)=0$.
Best Answer
By the inverse of $T$, you mean the map that "undoes" the matrix transpose. And that's accomplished by transposing again! So $T^{-1} = T$. You're right that $T^2 = I$, which means that $T$ is an involution (and yes, $T^2(A)$ is the same thing as $T(T(A))$.) Finally, the inverse of $T$ doesn't have anything to do with inverting its input argument $A$. After all, transposition is defined for all square matrices, not just the invertible ones.