If it was indeed $PA = PLU$, then $P$ would be useless. Just premultiply with $P^{-1} = P^T$ and you get $A = LU$. No, it has to be $PA = LU$.
You do it in steps:
Find the pivot row and get it to the first position, say by swapping it with the first row. This will give you a permutation $P_1$.
Do the first column elimination of $P_1 A$ to obtain
$$P_1 A = L_1 A_1,$$
where $L_1$ is lower triangular with a possibly non-zero first column and $1$ on its diagonal, while $A_1$ has zeros below the diagonal in the first column.
Repeat on the $A_1[2:n,2:n]$ (i.e., $A_1$ minus its first row and column) to obtain $P_2$, $L_2$, $A_2$, and so on.
When done, multiply (or smartly combine) all $P_k$ and $L_k$ (extended to dimension $n$ by identity) to get $P$ and $L$, respectively. You get $U$ by combining the first rows from all $A_k$.
You should probably check out Crout and LUP algorithms, where a similar - but I think a somewhat better - algorithm is described.
A bit on the stability of LU
A question in the comments:
But you see, $PA = LU$ is the $LU$ decomposition of $PA$. I want the $LU$ decomposition of $A$. But since $A$ is not stable, from what I understood, I need to find $LU$ such that $A = LU$ by doing an LU factorization on $PA$. As in - find the $LU$ factorization of $A$, by running the $LU$ factorization of the stable $PA$. does that not make sense?
Sensible as it may seem, it is not.
Take any $A$ such that $A_{1,1} = 0$ and you will not be able to do LU at all.
Now, take $A_{1,1}>0$, but very small, say $A_{1,1}=10^{-15}$, while all other elements of $A$ are non-zero integers. See what happens.
This is where the instability comes from and the result itself is unstable, so there is no algorithm that can fix it because any algorithm would have to get to that same final solution (LU is unique if $A$ is nonsingular).
By introducing the partial pivoting on the left side of $A$, you are avoiding this effect.
Effectively, these permutations are only permuting equations of your system, nothing more. So your LU of $PA$ is really the Gaussian eliminations algorithm performed on the same equations system, only these equations are in a different order.
Similarly, partial pivoting of columns would be the same as permuting the variables (thus introducing the need to be careful which variable gets which value in the end). The full pivoting would permute both the equations and the variables, for the most stable way to solve the system.
Variable-wise, all of these (when they can be performed) give you the same solution to a system of linear equations. But, in finite arithmetic, some solutions produce more errors than others. So, you'd rather get $x_1 = x_2$ than $x_1 = 10^{17}(1 + 10^{-17}x_2 - 1)$. Notice that the latter would actualy give you $x_1 = 0$ if $x_2 \approx 1$, even though it is (in exact arithmetic) the same solution as $x_1 = x_2$.
That's why $A = LU$ can be a problem, not just they way how you computed it.
We only need to show that after eliminating $a_{2,1}$, diagonal dominance is preserved, i.e.,
$$
\left|a_{2,2}-a_{1,2}{a_{2,1}\over a_{1,1}}\right|>\sum_{i=3}^n\left|a_{2,i}-a_{1,i}{a_{2,1}\over a_{1,1}}\right|,
$$
which is equivalent to
$$
|a_{2,2}a_{1,1}-a_{1,2}a_{2,1}|>\sum_{i=3}^n|a_{2,i}a_{1,1}-a_{1,i}a_{2,1}|.
$$
But this is true:
\begin{eqnarray*}
\sum_{i=3}^n|a_{2,i}a_{1,1}-a_{1,i}a_{2,1}|&\le&
|a_{1,1}|\sum_{i=3}^n|a_{2,i}|+|a_{2,1}|\sum_{i=3}^n|a_{1,i}| \\
&<& |a_{1,1}|(|a_{2,2}|-|a_{2,1}|)+|a_{2,1}|(|a_{1,1}|-|a_{1,2}|) \\
&=&|a_{1,1}||a_{2,2}|-|a_{2,1}||a_{1,2}|\\
&\le& |a_{1,1}a_{2,2}-a_{2,1}a_{1,2}|
\end{eqnarray*}
Best Answer
Rule of Thumb/TL;DR: When doing calculations using floating point numbers (such as double, single, and float data types in many common programming languages), use partial pivoting unless you know you're safe without it and complete pivoting only when you know that you need it.
Longer Explanation: A square matrix $A$ has an $LU$ factorization (without pivoting) if, and only if, no zero is encountered in the pivot position when computing an $LU$ factorization of $A$. However, when does computations using floating point numbers a pivot that is nearly zero can lead to dramatic rounding errors. The simple workaround is to always permute the rows of the matrix such that the largest nonzero entry in a column is chosen as the pivot entry. This ensures that a nearly zero is never chosen. Complete pivoting goes even further by using row and column permutations to select the largest entry in the entire matrix as the pivot entry.
The above paragraph is a lose, intuitive picture of why pivoting is necessary. One can also prove sharp error bounds by carefully tracking the propagation of errors throughout the entire $LU$ factorization calculation. One way of structuring this error bound is a so-called backward error estimate. In a backwards error estimate for solving a linear system of equations $Ax = b$, one bounds the perturbation $E$ necessary to make the computed solution $\hat{x}$ produced by doing Gaussian elimination followed by back substitution an exact solution to a nearby system of linear equations $(A+E)\hat{x} = b$. A backwards error estimate can be very revealing if, for example, the matrix $A$ is determined by measurements of an engineering system with some error tolerance. If the entries of $A$ are only known $\pm 1\%$ and the backwards error is less than $0.1\%$, then the numerical errors made during our computations are smaller than the measurement errors and we've done a good job. TL;DR the quantity $E$ is a reasonable quantity to measure the error in an $LU$ factorization and resulting linear solve.
For Gaussian elimination without pivoting, the backwards error can be arbitrarily bad. Fortunately, for partial pivoting, the backwards error $E$ can be bounded as $\|E\|_\infty \le 6n^3 \rho \|A\|_\infty u + \mbox{higher order terms in $u$}$ ${}^\dagger$. Here, $\|\cdot\|_\infty$ is the operator matrix $\infty$-norm and $u$ is the unit roundoff which quantifies the accuracy of floating point computations ($u \approx 10^{-16}$ for IEE double precision arithmetic). The quantity $\rho$ is known as the growth factor for partial pivoting. While it is possible for $\rho$ to be as large as $2^{n-1}$, in practice $\rho$ usually grows very modestly with $n$.${}^\dagger$ Concerning the fact that $\rho$ grows very modestly in most applications, legendary numerical analyst Kahan wrote "Intolerable pivot-growth [with partial pivoting] is a phenomenon that happens only to numerical analysts who are looking for that phenomenon."${}^\$$
Nonetheless, one can write down matrices for which partial pivoting will fail to give an accurate answer due to an exponential growth factor $\rho = 2^{n-1}$. Wilkinson showed that the growth factor for complete pivoting is much smaller in the worst case $\rho \le n^{1/2} (2\cdot 3^{1/2}\cdots n^{1/n-1})^{1/2}$. In practice, $\rho$ for complete pivoting is almost always less than $100$.${}^\dagger$
After having delved into the details, you can see that this is a subtle business, and even in the classical field of numerical linear algebra there is somewhat of a gap between theory and experiment. In general, Gaussian elimination with partial pivoting is very reliable. Unless you know you can get away without pivoting (symmetric positive definite and diagonally dominant matrices are notable examples), one should use partial pivoting to get an accurate result. (Or compensate with something clever. For example, SuperLU uses "static pivoting" where one does "best guess" pivoting before starting the $LU$ factorization and then does no pivoting during the factorization. The loss of accuracy of this approach is compensated by using a few steps of iterative refinement.)
If partial pivoting isn't accurate enough, one can move to using complete pivoting instead for its lower growth factor. As user3417 points out, there are other ways of solving $Ax = b$ other than using $LU$ factorization-based approaches and these may be faster and more accurate than Gaussian elimination with complete pivoting. For example, $QR$ factorization runs in the $O(n^3)$ operations and has no growth factor. There may be special cases when one truly does want to use an $LU$ factorization: for example, a Gaussian elimination-based approach can be used to construct structure-preserving factorizations of a Cauchy-like matrix. In this case, complete pivoting (or its close cousin rook pivoting) may be the best approach.
${}^\dagger$ Reference Golub and Van Loan's Matrix Computations Fourth Edition Chapter 3.4
${}^\$$ Quoted in Higham's Accuracy and Stability of Numerical Algorithms Second Edition Chapter 9