This is correct, thought it could be stated a little better, and there is a shorter argument.
To state it better, note that you're really just doing an induction on $n$: you're showing that if the transfer of $n-1$ disks goes through all possible arrangements, then so does the transfer of $n$ disks.
The short argument is simply to count the arrangements, as you did, and note that the optimal sequence of moves never repeats a position. (If it did, it couldn't be optimal.) Thus, it must go through every possible arrangement.
I will show it by induction on $n$ (the number of disks). Let $T(n)$ be the number of moves needed to go from an arbitrary configuration to another arbitrary configuration. Let $P(n)$ be the proposition "$T(n)\leq 2^n-1$". That is, $P(n)$ is the proposition "All starting and ending configurations of $n$ disks are at most $2^n-1$ moves apart.". I want to show that $P(n)$ is true for every natural number $n$.
For the base case, $P(0)$ is true, because it is clear that $T(0)=0\leq 2^0-1=0$.
For the inductive hypothesis, let's assume that $P(n-1)$ is true; that is, let's assume that it is true that $T(n-1)\leq 2^{n-1}-1$. Now, I will show that $P(n-1)$ implies $P(n)$.
First method
Suppose I have a configuration of $n-1$ disks. By the inductive hypothesis, we know that $T(n-1)\leq 2^{n-1}-1$. Now, let's consider what will change if we add one more disk (an $n^{th}$ disk), smaller than every other disk in the configuration, so that we have a situation where there are $n$ disks. Since this $n^{th}$ disk is the smallest one, it will always be on top in one of the three pegs, due to the rules of the problem.
The bottom $n-1$ disks will make $T(n-1)$ moves. For each move of the $n-1$ disks, in the worst case, the new smallest disk will be either on top of the disk that wants to move, or on top of the peg to which the disk wants to move. So, the new disk will have to make at most one move for each move of the other $n-1$ disks, that is, the new disk will make at most $T(n-1)$ moves. At last, when the $n-1$ disks are already at their correct positions in the end configuration, the new disk may have to make one more move to reach its correct position in the end configuration.
So, there will be at most $T(n) = T(n-1) + T(n-1) + 1 \leq 2^{n-1}-1+2^{n-1}-1+1 = 2^n - 2 + 1 = 2^n - 1$ moves. Therefore, $T(n) \leq 2^n - 1$, which shows that $P(n)$ is true.
Second method
I can also build the inductive step by considering that the new disk is the largest disk (instead of being the smallest one), as explained in the comments below.
Suppose we have a configuration of $n-1$ disks. By the inductive hypothesis, these $n-1$ disks need $T(n-1) \leq 2^{n-1}-1$ moves to reach any other configuration. Now, let's add an $n^{th}$ disk, which is the largest disk. We have two cases:
(1) If the largest disk doesn't move from the starting configuration to the end configuration, then we can just move the top $n-1$ disks as if there were only $n-1$ disks; so, in this case, it takes at most $2^{n-1} - 1$ moves. Therefore, it is true that $T(n) \leq 2^{n} - 1$. So, $P(n)$ is true in this case.
(2) If the largest disk moves, so we first have to move it to its end configuration. For this, we may have to move the top $n-1$ disks to a peg so that they are out of the way; by the inductive hypothesis, this takes at most $2^{n-1} - 1$ moves. Then, we move the largest disk to its correct position in the end configuration (1 more move). Then, we move the top $n-1$ disks to their correct positions in the end configuration (at most $2^{n - 1} - 1$ moves). The whole procedure takes at most $2^{n-1} - 1 + 1 + 2^{n-1} - 1 = 2^n - 1$ moves. So, $T(n) \leq 2^n - 1$. This shows that $P(n)$ is true in this case.
Best Answer
The "word" description is in fact the guide to a proof. The only problem is showing that the process you describe is in fact optimal. This is done by induction.
we want to prove the formulas you give are the optimal solution.
The case $n=0$ is immediate.
Now: to prove the $k+1$ case assuming the $k$th case (so our induction hypothesis is that the formula you give is optimal for $R_k$ and for $Q_k$), you take a stack of $k+1$ disks. To move them all forward (to perform $Q_{k+1}$) you must get the top $k$ out of the way; that is done, by the induction assumption, with $R_k$ as optimal. Then you move the $k+1$ disk to the target, taking one move; then you move the disks from where they are to the target. Again, by the induction assumption, the best possible way of doing this is $R_k$. So you get that $Q_{k+1}=R_k+1+R_k=2R_k+1$, on the assumption that $Q_k$ and $R_k$ are the optimal moves.
The same argument works for $R_{k+1}$; the only thing that your "just words" is missing is to be clear about what the induction hypothesis is, and to apply it.
Added: Why is the given strategy actually the best? In order to move all the disks from the original rod to the target rod, we must move the largest disk at least once; note that the position of the largest disk does not affect the validity or lack thereof of any move, so the position of the largest disk is immaterial for any other moves (as opposed to any other disk). So the optimal solution will only require us to move the largest disk exactly once, from the original rod to the target rod. In order to do this, we must first remove all disks from above the largest disk, and have no disks on the target rod so we can perform the move; that is $R_k$, because the only configuration that allows us to move the largest disk from the original rod to the target rod is to have all other disks in the non-target rod. Then we move the largest disk, which is the $+1$. And now, we just want to move the remaining disks to their final position in the quickest possible way, which again is $R_k$. A similar argument focusing on placing the largest disk in its final position, works for the computation of $R_{k+1}$.