I understand now where your confusion stems from. The thing is we don't really care about whether we say that there is a matrix $P$ such that $PAP^{-1}$ is diagonal, or that there is a matrix $P$ such that $P^{-1}AP$ is diagonal, because both definitions are equivalent, replacing $P$ by $P^{-1}$. The same thing happens with the concept of orthogonal diagonalization: there is a matrix $Q$ such that $QAQ^T$ is diagonal if and only if there is a matrix $Q$ such that $Q^TAQ$ is diagonal. Both definitions are equivalent replacing $Q$ by $Q^T$, so don't worry about where you write the transpose, or the inverse. Just be consistent once you decide upon one way to write that a matrix is diagonalizable or orthogonally diagonalizable.
Gerry's answer shows exactly how one would be consistent once written down the definitions.
I assume $P$ is a real valued matrix. (If it requires $\mathbb C$ the below can be slightly altered to instead contemplate Hermitian forms.)
Consider the coordinate vector space given by $V=\mathbb R^n$ and a linear operator on this space given by $T:= P^{-1}AP$. It suffices to show that $T$ is similar to a real orthogonal matrix. Since $T^k$ is nonsingular, so is $T$.
With $\langle, \rangle$ denoting the standard real inner product, we define the following custom symmetric bilinear form. For $v,v' \in V$
$\langle v, v' \rangle_c := \frac{1}{k}\sum_{j=0}^{k-1}\langle T^j v, T^j v'\rangle$.
It is immediate that this form is positive definite. Further notice
$\langle Tv, Tv' \rangle_c $
$= \frac{1}{k}\sum_{j=0}^{k-1}\langle T^{j+1}v, T^{j+1}v'\rangle $
$= \frac{1}{k}\Big(\sum_{j=0}^{k-2}\langle T^{j+1}v, T^{j+1}v'\rangle\Big) + \frac{1}{k}\langle T^{k}v, T^{k}v'\rangle$
$= \frac{1}{k}\Big(\sum_{j=1}^{k-1}\langle T^{j}v, T^{j}v'\rangle\Big) + \frac{1}{k}\langle v, v'\rangle$
$= \frac{1}{k}\sum_{j=0}^{k-1}\langle T^j v, T^j v'\rangle$
$=\langle v,v' \rangle_c $
This implies $T$ is an orthogonal operator with respect to the custom bilinear form.
Now compute the image of $T$ with respect to a well chosen basis
$T\mathbf B=\mathbf BQ$
where $\mathbf B$ is selected to be some orthonormal basis with respect the the custom bilinear form and $Q$ is some matrix. Since our vector space is $V=\mathbb R^n$, we note that $\mathbf B$ may be also interpreted as an invertible matrix.
$\langle v, v' \rangle_c = \langle Tv, Tv' \rangle_c \longrightarrow$ $Q$ is orthogonal with respect to the standard inner product.
Finally
$T =T\big(\mathbf B\mathbf B^{-1}\big) = \big(T\mathbf B\big)\mathbf B^{-1}= \big(\mathbf BQ\big)\mathbf B^{-1}= \mathbf BQ\mathbf B^{-1}$
thus $T$ is similar to an orthogonal matrix
detailed justification that $Q^TQ = I$:
$v = \mathbf B\mathbf x$ and $v' =\mathbf B y$;
$\mathbf w = Q\mathbf x$ and $\mathbf z = Q\mathbf y$
$\langle T v, Tv'\rangle_c$
$=\langle T\mathbf B\mathbf x\mathbf , T\mathbf B\mathbf y\rangle_c$
$=\langle \mathbf B (Q\mathbf x), \mathbf B(Q\mathbf y)\rangle_c$
$=\langle \mathbf B \mathbf w, \mathbf B\mathbf z\rangle_c$
$=\langle \sum_{k=1}^n \mathbf b_k w_k , \sum_{i=1}^n \mathbf b_i z_i\rangle_c$
$=\sum_{k=1}^n w_k\langle \mathbf b_k , \sum_{i=1}^n \mathbf b_i z_i\rangle_c$
$=\sum_{k=1}^n w_k\sum_{i=1}^n z_i \langle \mathbf b_k , \mathbf b_i \rangle_c$
$=\sum_{k=1}^n w_k z_k\langle \mathbf b_k , \mathbf b_k \rangle_c$
$=\sum_{k=1}^n w_k z_k$
$=\mathbf w^T\mathbf z$
$=\mathbf x^T Q^T Q\mathbf y$
and by virtually identical calculation
$\langle v, v'\rangle_c = \mathbf x^T \mathbf y\longrightarrow \mathbf x^T \mathbf y = \mathbf x^T Q^T Q\mathbf y$
where the implication follows because $\langle Tv, Tv'\rangle_c = \langle v, v'\rangle_c$
Since the above holds for the selection of arbitrary $\mathbf x$ and $\mathbf y$ we conclude that $Q$ is orthogonal with respect to the standard inner product.
note
The above also gives a proof for why $M^k = I$ implies that $M$ is diagonalizable over $\mathbb C$, as $I$ is just a special case of a real orthogonal matrix. The above shows that $M$ is similar to a real orthogonal matrix which by spectral theorem is similar to a diagonal matrix (over $\mathbb C$). The standard proof of this result that you'll see on this site uses a minimal polynomial argument, though the minimal polynomial doesn't seem to apply as well to OP's question.
Best Answer
If $M$ is a real symmetric matrix, then by the first part of the theorem, it is diagonalizable, that is
$$M = P D P^{-1}$$
where $D$ is a diagonal matrix with entries equal to the eigenvalues of $M$, and where the columns in $P$ are the corresponding eigenvectors. What we need to show is that the matrix $P$ can in fact be chosen to be orthogonal. To see this, we need the other part of the theorem, namely that $M$ has an orthonormal basis of eigenvectors. By choosing an orthonormal basis of eigenvectors, the corresponding $P$ becomes orthogonal.