For a start the accepted usage for "rational canonical form" in the literature
is for a diagonal sum $C(f_1)\oplus C(f_2)\oplus\cdots\oplus C(f_k)$ where
$C(f_i)$ is the companion matrix for a monic polynomial $f$ and
$f_1\mid f_2\mid\cdots\mid f_k$. That said, if I needed to find a centralizer
explcitly it isn't the canonical form I would choose.
As ever we should think of $V=k^n$ as a $k[X]$-module where $X$ acts via $A$.
If the minimum polynomial of $A$ is $u_1^{a_1}\cdots u_k^{a_k}$ with the
$p_i$ distinct irreducibles then $V$ splits uniquely into a direct sum of submodules
$M_1\oplus\cdots\oplus M_k$, where $M_i$ is annihilated by a power of $u_i$.
Both $A$ and ts centralizer fix each $M_i$, so we may
reduce to the case where the minimum polynomial $u^a$ with $u$ irreducible.
Let $u$ have degree $r$ and let $C\in M_r(k)$ be the companion matrix for $u$
or any conjugate for it. I claim that $A$ is conjugate to a diagonal sum of
``Jordan-style blocks'' like
$$J=\left(\begin{matrix}
C&I&O&O\\\
O&C&I&O\\\
O&O&C&I\\\
O&O&O&C
\end{matrix}\right).$$
This is just a question of checking this has the minimum polynomial $u^a$.
As we are working in $GL(n,p)$ then $C\ne0$ and over a finite field then the diagonal
sum of copies of $C$ is a polynomial in $A$ (indeed $A^{p^rs}$ for suitable $s$).
Thus each matrix in the centralizer of $A$ has a block decomposition into $r$-by-$r$
blocks which commute with $C$ and so are polynomials in $C$. So,
finding the centralizer of $A$ is equivalent to finding the centralizer of the matrix
$A'$ over $k'=k(\alpha)$ where $\alpha$ is a zero of $u$ and $A'$ is obtained by
replacing the above Jordan-style blocks by standard Jordan blocks
$$J'=\left(\begin{matrix}
\alpha&1&0&0\\\
0&\alpha&1&0\\\
0&0&\alpha&1\\\
0&0&0&\alpha
\end{matrix}\right).$$
The centralizer of $A'$ is the same as that of $A'-\alpha I$ which is
a very sparse matrix. At this stage I would just find the centralizer in the matrix
algebra by explicit calculation and extract the nonsingular elements as the centralizer
in the matrix group.
Added (6/6/2010) I claimed that $J$ had the minimum polynomial $u^a$.
Let $k'=k(\alpha)$. The matrix $J'$ gives the action of $x$ on a standard
$k'$-basis for the $k'[x]$-module $k'[x]/((x-\alpha)^a)$. Therefore $J$
gives the action of $x$ on a $k$-basis for $k'[x]/((x-\alpha)^a)$ considered as a
$k[x]$-module. To see that $J$ has minimum polynomial $u^a$ it suffices
to show that some element of $k'[x]/((x-\alpha)^a)$ has annihilator $u^a$
over $k[x]$. But the element $1$ has annilhator $(x-\alpha)^a k'[x]$
over $k'[x]$ and so has annihilator
$k[x]\cap (x-\alpha)^a k'[x]=u^a k[x]$ over $k[x]$,
provided the extension $k'/k$ is separable (so that $x-\alpha$ is not a repeated
factor of $u$).
So the assertion holds for finite fields, bt not necessarily for general
fields of prime characteristic. It would be interesting to work out the
details for inseparable irreducible polynomials.
OK, I think I have a full answer at this point, so let me post it.
Step 1. (algebra).
If $P$ is an $n\times n$ stochastic matrix and $\lambda$ is an eigenvalue of $P$ with $|\lambda|=1$, then $\lambda^k=1$ for some $k\le n$ and $1,\lambda,\lambda^2,\dots\,\lambda^{k-1}$ are eigenvalues of $P$.
Indeed, let $x$ be an eigenvector. WLOG, $\lambda\ne 1$. Let $S=\{j:|x_j|=\max_k|x_k|\}$. Then if $i\in S$ and $p_{ij}>0$, then $j\in S$. Now, if $i\in S$ and $p_{ij}>0$, then $x_j=\lambda x_i$. Thus, if we have some entry in $x$, we also have $\lambda^k$ times this entry for every $k$, but the number of different entries is at most $n$. Moreover, we can assume that one of the entries is $1$ and split the indices in $S$ into groups $S_m$ by the rule $j\in S_m$ iff $x_j$ is in the half-open counterclockwise arc from $\lambda^m$ to $\lambda^{m+1}$ so that $i\in S_m$, $p_{ij}>0$ imply $j\in S_{m+1}$. From here we immediately see that $P-\lambda^qI$ is not invertible for every $q$ (the $S$-block annihilates the vector $y_j=\lambda^{qm}$ for $j\in S_m$ and the full determinant has the determinant of the $S$-block as a factor. Thus, all powers of $\lambda$ are eigenvalues.
Step 2. (compactness argument).
Consider a convergent sequence $P_k$ of $n\times n$ stochastic matrices with the limit $P$. Assume that $P_k$ have eigenvalues $\lambda_k$ and $\lambda_k$ are not contained in any Stolz angle. Then we may assume that $\lambda_k\to\lambda$, $|\lambda|=1$. Clearly, $\lambda$ is an eigenvalue of $P$. If $\lambda\ne 1$, then $P$ has several eigenvalues summing to $0$ (powers of $\lambda$), so $\operatorname{Tr} P\le n-2$, which makes it not a limit point of your set. But If $\lambda=1$, it is even worse, because, if $\lambda_k$ approach $1$ tangentially, then $\lambda_k^{m_k}$ can tend to any point on the unit circle but they are also eigenvalues of $n\times n$ stochastic matrices (powers of $P_k$) and so are their limits. Thus, we have some fixed (but depending on $n$) Stolz angle, containing all the eigenvalues of your matrices.
Step 3. (harmonic analysis)
Let $f(m)=\sum_{k=1}^n c_k\lambda_k^m$ for $m\ge 0$ and $0$ for $m<0$ where $\lambda_k$ lie in some fixed Stolz angle $A$. Then
$$
Vf=\sum_{m\in\mathbb Z}|f(m+1)-f(m)|\le C(A,n)\max_m |f(m)|
$$
Proof:
We begin with a
Complex analysis lemma
Let $F(z)=\sum_{k=1}^n c_k e^{\mu_k}z$ where $\mu_k\in\mathbb C$, $|\mu_k|\le 1$. Then $F$ has at most $C(n)$ zeroes in the unit disk.
Proof: Let $m$ be the maximum of $|f|$ over the unit disk. Then the first $n$ derivatives at the origin are bounded by $C(n)m$. But $\Phi(t)= F(zt)$ ($|z|=1$) satisfies an $n$-th order differential equation $\Phi^{(n)}=\sum_{k=0}^{n-1}b_k\Phi^{(k)}$ with coefficients $b_k$ obtained by expansion of the polynomial $\prod_{k=1}^n (x-z\mu_k)$, which are bounded by $2^n$, say. The standard ODE theory implies that $\Phi$ is bounded by $C'(n)m$ on $[-2,2]$, so the ratio of the maximum of $F$ over the disk of radius $2$ and over the unit disk is bounded, which is enough to control the number of zeroes in the unit disk (each Blaschke factor moves it up fixed number of times). Rescaling and covering, we conclude that if $|\mu_k|<\mu$, then there may be only $C(n,K)$ zeroes in the disk of radius $K/mu$.
Now,
Induction
If $n=1$, the claim is obvious: the maximum is just $c_1$ and $|\lambda_1^k-\lambda_1^{k+1}|\le C(A)(|\lambda_1|^k-|\lambda_1|^{k+1})$
Let $n>1$. Write $\lambda_k=e^{-\mu_k}$ with $|\mu_k|\le C(A)\operatorname{Re\mu_k}$. Let $\mu=\max|\mu_k|=\mu_n$. Note that $f$ is the trace of $F(z)=\sum_{k=1}^n c_ke^{-\mu_k z}$ on integers. The derivative of the real or imaginary part of $F(t)$ can have only $C(2n,K)$ zeroes on $[0,K/\mu]$, so the real and the imaginary parts have a bounded number of intervals of monotonicity there whence $f$ has variation dominated by its maximum on $[0,K/\mu]$. Now, choose $K=K(A)$ so that $\gamma=\lambda_n^N$ is less than $1/2$ in absolute value where $N\approx K/\mu$. The function $g(m)=f(m+N)-\gamma f(m)$ for $m\ge 0$ and $0$ for $m<0$ is bounded by $2\max|f|$ and has one term less. Thus, by the induction assumption, $Vg\le C(n)\max|f|$.
To recover $f$ from $g$, note that $f(m)-\gamma f(m-N)=G(m)$ where $G(m)=g(m-N)$ for $m\ge N$ and $G(m)=f(m)$ for $m\le N$. Note that $VG$ is still under control because we have bounded both $Vg$ and the part of $Vf$ corresponding to the interval $[0,N]$. Now it remains to iterate this recurrence to get $f(m)=G(m)+\gamma G(m-N)+\gamma^2 G(m-2N)+\dots$ and to use the shift invariance of and the triangle inequality for the total variation.
Step 4. (the end)
Each entry of the matrix $P^k$ is of this form (assuming that the eigenvalues are distinct, which is a dense case). Thus, the total variation of each entry is bounded by some $C(n)$ depending on $n$ only. This is equivalent to $\sum_k\|P^k-P^{k+1}\|\le C(n)$ but the sequence of norms ($\ell^\infty$) is non-increasing, so it is $O_n(k^{-1})$.
Feel free to ask questions :). I suspect this all is written in some obscure textbooks but to do the literature search now is beyond my abilities.
Best Answer
One answer to your question is already hinted at in the question. At the level of algorithms, basis-independent vector spaces don't really exist. If you want to compute a linear map $L:V \to W$, then you're not really computing anything unless both $V$ and $W$ have a basis. This is a useful reminder in our area, quantum computation, that has come up in discussion with one of my students. In that context, a quantum algorithm might compute $L$ as a unitary operator between Hilbert spaces $V$ and $W$. But the Hilbert spaces have to be implemented in qubits, which then imply a computational basis. So again, nothing is being computed unless both Hilbert spaces have distinguished orthonormal bases. The reminder is perhaps more useful quantumly than classically, since serious quantum computers don't yet exist.
On the other hand, when proving a basis-independent theorem, it is almost never enlightening (for me at least) to choose bases for vector spaces. The reason has to do with data typing: It is better to write formulas in such a way that the two sides of an incorrect equation are unlikely to even be of the same type. In algebra, there is a trend towards using bases as sparingly as possible. For instance, there is widespread use of direct sum decompositions and tensor decompositions as a way to have partial bases.
I think that your question about examples of proofs can't have an explicit answer. No basis-independent result needs a basis, and yet all of them do. If you have a reason to break down and choose a basis, it means that the basis-independent formalism is incomplete. On the other hand, anything that is used to build that formalism (like the definition of determinant and trace and the fact that they are basis-independent) needs a basis.
There is an exception to the point about algorithms. A symbolic mathematics package can have a category-theoretic layer in which vector spaces don't have bases. In fact, defining objects in categories is a big part of the interest in modern symbolic math packages such as Magma and SAGE.