There are two different typical situations for these kind of problems:
i) you want to generate a sample from a given distribution whose population characteristics match the ones specified (but due to sampling variation, you don't have the sample characteristics exactly matching).
ii) you want to generate a sample whose sample characteristics match the ones specified (but, due to the constraints of exactly matching sample quantities to a prespecified set of values, don't really come from the distribution you want).
You want the second case -- but you get it by following the same approach as the first case, with an extra standardization step.
So for multivariate normals, either can be done in a fairly straightforward manner:
With first case you could use random normals without the population structure (such as iid standard normal which have expectation 0 and identity covariance matrix) and then impose it - transform to get the covariance matrix and mean you want. If $\mu$ and $\Sigma$ are the population mean and covariance you need and $z$ are iid standard normal, you calculate $y=Lz+\mu$, for some $L$ where $LL'=\Sigma$ (e.g. a suitable $L$ could be obtained via Cholesky decomposition). Then $y$ has the desired population characteristics.
With the second, you have to first transform your random normals to remove even the random variation away from the zero mean and identity covariance (making the sample mean zero and sample covariance $I_n$), then proceed as before. But that initial step of removing the sample deviation from exact mean $0$, variance $I$ interferes with the distribution. (In small samples it can be quite severe.)
This can be done by subtracting the sample mean of $z$ ($z^*=z-\bar z$) and calculating the Cholesky decomposition of $z^*$. If $L^*$ is the left Cholesky factor, then $z^{(0)}=(L^*)^{-1}z^*$ should have sample mean 0 and identity sample covariance. You can then calculate $y=Lz^{(0)}+\mu$ and have a sample with the desired sample moments. (Depending on how your sample quantities are defined, there may be an extra small fiddle involved with multiplying/dividing by factors like $\sqrt{\frac{n-1}{n}}$, but it's easy enough to identify that need.)
I will write matrix square roots of $\Sigma$ as $\Sigma=A A^T$, to be consistent with the Cholesky decomposition which is written as $\Sigma = L L^T$ where $L$ is lowtri (lower triangular). So let $X$ be a random vector with $\DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\Cov}{\mathbb{Cov}}\DeclareMathOperator{\Var}{\mathbb{Var}} \E X=\mu$ and $\Var X =\Sigma$. Let now $Z$ be a random vector with expectation zero and unit covariance matrix.
Note that there are (except for the scalar case) infinitely many matrix square roots. If we let $A$ be one of the, then we can find all the others as $A \mathcal{O}$ where $\mathcal{O}$ is any orthogonal matrix, that is, $\mathcal{O} \mathcal{O}^T = \mathcal{O}^T \mathcal{O} =I$. This is known as unitary freedom of square roots.
Let us look at some particular matrix square roots.
First a symmetric square root. Use the spectral decomposition to write $\Sigma = U \Lambda U^T = U\Lambda^{1/2}(U\Lambda^{1/2})^T$. Then $\Sigma^{1/2}=U\Lambda^{1/2}$ and this can be interpreted as the PCA (principal component analysis) of $\Sigma$.
The Cholesky decomposition $\Sigma=L L^T$ and $L$ is lowtri. We can represent $X$ as $X=\mu + L Z$. Multiplying out to get scalar equations, we get a triangular system in $Z$, which in the time series case can be interpreted as a MA (moving average) representation.
The general case $A= L \mathcal{O}$, using the above we can interpret this as a MA representation after rotating $Z$.
Best Answer
Your answer is good. Note that since $\Sigma$ is symmetric and square so is $\Sigma^{-1}$. The matrix, its transpose, or inverse all project your vector $\Sigma r$ in the same space.
Since $\Sigma$ and $\Sigma^{-1}$ are positive definite, all eigenvalues are positive. Thus a multiplication with a vector always ends up in the same halfplane of the space.
Now if $\Sigma$ or $\Sigma^{-1}$ would be a diagonal matrix, then the multiplication would reweigh (or undo the reweigh) of only the lengths of the target vector in each dimension (as you noticed). If they are full matrices, then indeed the matrix is full rank as it is PSD, the eigendecomposition exists and $\Sigma = V \Lambda V^{-1}$, here $V$ is an orthonormal eigenvector matrix by the virtue of $\Sigma$ being PSD, and $\Lambda$ the diagonal with eigenvalues. Thus $r$ is first rotated by $V^{-1}$, and then reweighed by $\Lambda$, then rotated back by $V$. The same thing goes for $\Sigma^{-1}$, but then $r$ is rotated the other way around and the scaled by the diagonal of reciprocals $\Lambda^{-1}$ and rotated back with $V^{-1}$. It is easy to see they are opposite processes.
Additionally, you may think of $$ r^T \Sigma^{-1} r = (\Lambda^{-1/2} V^T r)^T(\Lambda^{-1/2} V^T r) = \big\|\Lambda^{-1/2} V^T r\big\|^2 $$ as the length of your vector $r$ reweighed by the "standard deviations" after correction for cross-correlations.
Hope that helps.