Here is the argument given in section 1.3 of Gregory F. Lawler's Introduction to Stochastic Processes. It treats stochastic matrices $P$,
but I think the argument applies to general non-negative matrices.
For each state $i$ define $J_i=\{n: P^n(i,i)>0\}$. This is a semigroup and since
we have assumed that $P$ is aperiodic, we have $\gcd(J_i)=1$ and it follows that
$J_i$ contains all sufficiently large integers.
That is, there is some integer $M(i)$ so that for all $n\geq M(i)$ we have
$P^n(i,i)>0$.
Since $P$ is irreducible, there exists some $m(i,j)$ such that $P^{m(i,j)}(i,j)>0$.
Hence for $n\geq M(i)$,
$$P^{n+m(i,j)}(i,j)\geq P^n(i,i)\,P^{m(i,j)}(i,j)>0.$$
Let $M$ be the maximum value of $M(i)+m(i,j)$ over all pairs $(i,j)$. Then
for $n\geq M$, $P^n(i,j)>0$ for all states $i,j$.
Essentially the same argument is found in section 1.3 of
Markov Chains and Mixing Times by Levin, Peres, and Wilmer. So it looks like probabilists have not found a better proof.
First of all, let me recall the definition I've been given in Uni of reducible matrix:
A matrix $A\in\mathbb{C}^{n\times n}$ is reducible if there exists a permutation matrix $\Pi$ such that $\Pi\cdot A\cdot\Pi^T$ is in block upper triangular form. In other words, if it can be placed into such a form by simultaneous row and column permutation, to use Wolfram's phrasing.
However, Wolfram (compare here) has a different opinion. The definition given there is:
A matrix is reducible if there are two disjoint sets of indexes $I,J$ with $|I|=\mu$, $|J|=\nu$, $\mu+\nu=n$ such that for every $(i,j)\in I\times J$ we have $a_{ij}=0$.
Or the same thing phrased differently. Wolfram then states the equivalence of these definitions and the equivalence of the Disjoint Index Set Condition (the condition in Wolfram's definition, henceforth DISC) with the opposite of the strong connected graph (henceforth SCG). My definition of reducible matrix, i.e. the Upper Triangular Permutation Condition, will henceforth be referred to as UTPC, and their opposites as n-SCG, n-UTPC, n-DISC, with n for not. Now one problem I had was the definition of block UT, which I defined as:
A matrix is block UT if it has a block of zeroes in the bottom-left corner and non-zero blocks elsewhere.
With such a definition, a full matrix with a lone zero in the bottom-left corner would be block UT, but of course wouldn't satisfy the DISC, so I had to review my definition. I came to think it might be:
A matrix is block UT if it has non-zero blocks whose diagonals coincide with pieces of the matrix's diagonal and cover it all, and below those blocks there are only zeroes.
I'll try to add a picture to visually represent this. Right now I have no time, I hope not to forget about this. Assuming that this is the definition, it is quite evident that block UT form implies the DISC, and vice versa. So if I prove that permutations as in the UTPC do not alter the satisfying of the DISC, then I have proved that DISC and UTPC are equivalent, as Wolfram states. This will be the first step. Subsequently, I will have to prove the equivalence of DISC and SCG. Then, by transitivity of equivalence, I will have answered my question and proved this theorem.
Now, what happens when I apply a simultaneous row and column permutation? Let's restrict ourselves to the simple case of just two rows and the corresponding columns being swapped, as any matrix $\Pi$ can be written as the product of such simple ("elementary") permutation matrixes, and the transpose is the product of the transposes in inverse order, so applying the product matrix is like applying the "elementary" ones one after the other, and if one doesn't alter the DISC, than many subsequent clearly won't. What happens is basically a change of names in the indexes. Let's suppose we swap $i,j$ such that neither is in $I$. Then with the same indexes I satisfy the condition, since $a_{ki}=a_{kj}=0$ for all $k\in I$ implies that after swapping the columns it still works and $a_{ik}$ and $a_{jk}$ we don't care about since $i,j\not\in I$. In any case $a_{ik}=a_{jk}=0$ for all $k\in J$ stays true after the row and column swapping, so if $i,j\in I$ the DISC stays unchanged. If $i\in I$ and $j\in J$, then for $\Pi\cdot A\cdot\Pi$ we will have $i\in J$ and $j\in I$. That is because before swapping, we had $a_{ik}=0$ for all $k\in J$, $j$ being included, and $a_{kj}=0$ for all $k\in I$, $i$ therein included; swapping the columns makes $a_{jk}=0$ for all $k\in J$. Then swapping the columns means the zero in the place $a_{ij}$, brought by the row swapping into place $a_{jj}$, get to place $a_{ji}$. So if we move $j$ to $I$, the DISC is satisfied if and only if we can prove $i$ can be placed in $J$. But then the column swapping ensures that, by the same line of thought as this of the row swapping for $j$ moving to $I$. So the proof is concluded.
Proving the equivalence of DISC and n-SCG seems easier than proving UTPC$\iff$n-SCG, which is why I chose this path. Suppose we have DISC. Then we have $\mu$ nodes associated with the indexes in $I$ from which neither of the remaining $\nu$ nodes associated with the indexes in $J$ can be reached. Then it is definitely impossible to trace a loop through all the nodes in the graph, since at most one can hope to connect all the $\mu$ and $\nu$ nodes separately, and one of the $\nu$ nodes to one of the $\mu$, but not to close the loop by going "from $\mu$ to $\nu$".
The converse is a bit trickier I have proved it in special cases, and then assumed the same line of thought can be used for any size, but the proof is anyway long and very unelegant, so any suggestion of a more elegant proof is very welcome indeed. Instead of proving n-SCG implies DISC, I tried to prove that n-DISC implies SCG, because trying the former didn't seem to lead me anywhere, while the latter gave me something that seemed to be OK for any case. Let's start with $n=2$. Then we have a $2\times2$ matrix and 2 nodes. Let's take one. it can't be cut off from the other one, otherwise we would have the DISC. So from this node, be it $A$, we can reach the other one, be it $B$. The same argument on $B$ proves we can reach $A$ from $B$, and therefore the graph is SC. Let's go on with $n=3$. We start at one node, be it $A$. From it, as said above, we can reach at least one other node, be it $B$. From $B$ we can reach at least another node. Now one is tempted to say it is the last node, $C$, but of course not. So we have to cases: either $A\to B\to C$, or $A\leftrightarrow B$. In the former case, from $C$ we can reach at least one node. If it is $A$, we have concluded. If it is $B$, i.e. $A\to B\leftrightarrow C$ then from either $B$ or $C$ we can reach $A$, otherwise the DISC is violated by $B,C$ opposed to $A$. If $A\to B\leftrightarrow C\to A$, we have concluded, and if $A\leftrightarrow B\leftrightarrow C$, we have also concluded. However, we still have the latter case: $A\leftrightarrow B$. Of course, from one of those we must reach $C$. So $A\leftrightarrow B\to C$ or $C\leftarrow A\leftrightarrow B$. In the former case, we anyway have $A\to B\to C$, and we have shown before that this case leads to a positive conclusion. In the latter case, from $C$ we can reach either $B$ or $A$, otherwise we violate the supposed DISC. If $C\to B$, then $A\to C\to B\to A$, which concludes the proof. If $C\to A$, then $C\leftrightarrow A\leftrightarrow B$, which also concludes the proof. I tried it for $n=4$, but got bored in the process, concluded a positive case and left many uninvestigated cases. Such a proof is very long, boring and unelegant, and sounds also very nerdy, because of all the cases it considers, especially for $n$ growing. I wonder how the number of cases grows with $n$. Perhaps exponentially or factorially. From 2 to 3 we have seen an enormous change. In any case, I sure hope a more elegant proof exists and someone knows it and posts it here as a second answer. If that doesn't happen, I will isolate this bit and ask another question, and link that question in a second answer here. I will wait at most 12 days before doing so. Of course, marking that as a duplicate when I've had to answer this would sound totally absurd :). Anyway it seems reasonable to assume that the given proofs are error-free and that this last proof, with all its bad qualities, can be extended, with lots of patience, to greater dimensions. I wonder if it is possible to prove it by induction on $n$. I will not think about that, as I have already thought about this problem even too much, since it was a theorem given in Numerical Calculus without demonstration, I won't be asked for its proof in an exam, and this definitely suffices to get en extra point if I'm good enough to get asked the "30L" question, and besides I have many other proofs and exercises to worry about. So my writing here is concluded, save maybe for the other question linked to this one in a possible second answer, if I am forced to ask such a question.
Best Answer
Let $$A = \left[\begin{array}{ccc} 0 &1& 0\\ 0 &0 &1\\ 1 &0& 0 \end{array}\right],$$ then $$ A^1 = \left[\begin{array}{ccc} 0 &1& 0\\ 0 &0 &1\\ 1 &0& 0 \end{array}\right]\quad A^2 = \left[\begin{array}{ccc} 0 &0& 1\\ 1 &0 &0\\ 0 &1& 0 \end{array}\right]\quad A^3 = \left[\begin{array}{ccc} 1 &0& 0\\ 0 &1 &0\\ 0 &0& 1 \end{array}\right], $$ so for every pair $\langle i,j \rangle$ there is a power $m$ such that $(A^m)_{i,j} > 0$ and the matrix is irreducible. However, this is not a magical statement, worded in graph-theoretic form we have
and this is true only if the graph is strongly connected. Moreover, take a reducible matrix $B$:
$$B = \left[\begin{array}{ccc} B_1 &B_2\\ 0 &B_3 \end{array}\right]$$
and consider its corresponding graph. Observe that there is no edge from vertices of block $B_3$ to vertices of block $B_2$. In other words, once you get to $B_2$, there is no way out, in particular the graph cannot be strongly connected.
I hope this helps $\ddot\smile$