In the framework of the Gaussian elimination procedure, you get $L$ with all $1$s on the diagonal by never truly rescaling rows. You rescale them in the intermediate step before you add them to another row, but then you "scale them back" so that you're only changing the row that you actually added to. You can always perform Gaussian elimination in this way. (By contrast, you may not always be able to perform Gaussian elimination without exchanging rows to obtain your next pivot, so an $LU$ decomposition may not always exist.)
Your proof states $P_j$ is a permutation matrix. There is no explicit restriction on what $P_j$ permutes, but there is an implicit one, which has to do with the order in which row reduction steps are done.
Since $j > i$, meaning that $P_j$ was performed after the lower triangulars $L_1, L_2, \ldots, L_{j-1}$, it must be the case that $P_j$ only permutes rows $j+1$ and greater (since rows 1 through $j$ are taken care of by the lower triangulars $L_1, L_2, \ldots, L_{j-1}$). So, if $P_j$ swaps the rows $q$ and $r$, we have $q,r > j \ge s > i$.
Basically, the implicit restriction on $P_j$ leaves $\tilde L_i = L_i$, since the elements of $P_j$ and $P_j^T$ are the same as $I$ for rows/columns $1$ through $j$.
FWIW - the fact that this detail was not mentioned in my opinion makes this proof fairly poorly constructed, and introducing all of the $\tilde L$ notation makes it difficult to parse. A much better proof approach would be to demonstrate that $L_i$ and $P_j$ commute.
Example. In this example, we will take
- $i = 1$ (column number of nonzero element of $M$)
- $s = 2$ (row number of nonzero element of $M$)
- $j = 2$ (row reduction step number of permutation $P$)
- $q = 3$ (one of two rows permuted by $P$)
- $r = 4$ (one of two rows permuted by $P$)
A possible choice of $L_1$ and $P_2$ are as follows.
$$L_1 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \hspace{1 in} P_2 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{bmatrix}$$
From these, we can derive $M$ and $P^T$.
$$M = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \hspace{1 in} P_2^T = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{bmatrix}$$
Now, compute $\tilde L_1 = P_2 MP_2^T$ and notice that it equals $L_1$, and is thus lower diagonal.
$$\tilde L_1 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{bmatrix}\begin{bmatrix} 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0\end{bmatrix}$$
$$\tilde L_1 = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} = L_1$$
Best Answer
There's a variant of Gaussian elimination where you don't move the pivots until after you have cleared entries below the pivot. For example, in the $2 \times 2$ case, if you have $$A = \begin{pmatrix} 0 & 2 \\ 1 & 4 \end{pmatrix},$$ the first step in Gaussian elimination would be to swap the two rows. But in the variant, you only do 'L' moves first. In each column, the entry you choose to become the pivot is the same as in Gaussian elimination (the highest non-zero entry that does not occur in the row of a previous pivot), but you don't move it. You can still clear all the entries below a leading 1, so that after you finish the forward phase (minus row swapping), you can swap rows to get to row echelon form. This gives $PLA = U$, so multiplying on the left by $P^{-1}$, then $L^{-1}$ decomposes $A$ into $LPU$ form.
To illustrate with $A$, $$ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ -2 & 1 \end{pmatrix} \begin{pmatrix} 0 & 2 \\ 1 & 4 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix} $$ so $$ A = \begin{pmatrix} 1 & 0 \\ 2 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}. $$
(Of course, in this example, there's a simpler $PU$ decomposition, but the general algorithm hinted at above works in general.)