The factor $k$ should be clear.
To compute $a^{n-1}$ one can use repeated squaring and multiplication, which requires one or two modular multiplications per bit of $n$. As the number of bits is $\log n$, we get a factor of $\log n$.
A modular multiplication can be done by adding up to $\log n$ bitshifted copies of a factor. Each addition uses at most twice as many bits as $n$ has, so takes $O(\log n)$ steps; the reduction modulo $n$ might for example be done by using $\log n$ precomputed values, one for each of the "higher" bits. At least this does not hurt the O(\log n) for the addition.
With this simple method, we obtain thus
$$O(\underbrace{k}_{\text{repeats}}\times\underbrace{\log n}_{\text{expon.}}\times\underbrace{\log n}_{\text{mult.}}\times \underbrace{\log n}_{\text{add}})=O(k\times \log^3n).$$
To improve by a factor of $\frac{\log\log n\times\log\log\log n}{\log n}$, one needs to employ slightly smarter methods (I guess FFT or something like that).
Suppose that $x$ and $y$ are $n$ bits long. Then $x+x=2x$ is at most $n+1$ bits long, $2x+x=3x$ is at most $n+2$ bits long, and in general $kx$ is at most $n+k-1$ bits long. In particular, $xy$ is at most $n+n-1=2n-1$ bits long. The values of $z$ generated by this algorithm are the numbers $kx$ for $k=0,1,\ldots,y$, and we’ve just seen that they all have fewer than $2n$ bits. Addition and subtraction of two numbers of at most $2n$ bits takes at most $2kn$ time for some positive constant $k$, so each iteration of the loop takes at most $2kn$ time.
We go through the loop $y$ times. The largest $n$-bit number is $2^n-1$, so we go through the loop fewer than $2^n$ times, and the total time is therefore bounded above by $2kn\cdot2^n$. That is, if $t(x,y)$ is the time required for inputs of $x$ and $y$, we have $t(x,y)<2kn\cdot2^n$. It follows that if $T(n)$ is the maximum running time over all inputs $\langle x,y\rangle$ such that $x$ and $y$ are both at most $n$ bits long, then $0\le T(n)<2kn\cdot2^n$ for all $n\in\Bbb Z^+$. Since $2k$ is a fixed positive constant, this means precisely that $T(n)$ is $O(n2^n)$, which is exponenetial, not polynomial.
Note: I’ve neglected overhead: some time is spent in initialization and output. That overhead is essentially constant, however, so $T(n)$ is bounded by $c+2kn\cdot2^n$ for some constant $c$, and since $n2^n$ is unbounded, it’s not hard to verify that $c+2kn\cdot2^n$ is also $O(n2^n)$. This is routine, which is why it isn’t even mentioned in the solution that you have.
Best Answer
I can't read that code, but when you want to raise something to the power $n$, you can write $n$ out in binary, and then for every zero in that binary expansion you have to do a squaring, and for every one in the expansion you have to do a square and a multiply. So the number of operations you have to do is at most twice the number of bits in the binary expansion of $n$. But the number of bits in the binary expansion of $n$ is, essentially, $\log_2n$.