There is an elegant proof here, https://en.wikipedia.org/wiki/Rank_(linear_algebra)#First_proof
But developing it on an example is better for intuition.
There are 2 equivalent views of the relationship between vectors:
- The column vectors view
- The rows: which are the coefficients you use to combine the column vectors
Take:
$$
A = \begin{bmatrix}1 & 2 & 3 & -2\\0 & 1 & 1 & -1\end{bmatrix}
$$
Note that the first 2 columns are independent and the 2 next columns are linear combinations of the first two. Structure looks like:
$$
A=\begin{bmatrix}\alpha_1 & \beta_1 & \gamma(\alpha_1,\beta_1) & \delta(\beta_1)\\\alpha_2 & \beta_2 & \gamma(\alpha_2,\beta_2) & \delta(\beta_2)\end{bmatrix}
$$
where $\gamma(x,y)=x+y$ and $\delta(x,y)=-x$
The column space
The $columnspace$ $C(A)$ lives in $R^2$ and is the set of all $A\vec x$ , where $\vec x=\pmatrix{\alpha&\beta&\gamma&\delta}^T$ is a vector of 4 coefficients applied to the columns:
$$
C(A) = \alpha \pmatrix{1\\0}+\beta \pmatrix{2\\1}+\gamma\pmatrix{3\\1}+\delta\pmatrix{-2\\-1}\\ C(A) = \alpha \pmatrix{1\\0}+\beta \pmatrix{2\\1}+\gamma\pmatrix{1+2\\0+1}-\delta\pmatrix{2\\1} \\ C(A) = (\alpha+\gamma) \pmatrix{1\\0}+(\beta+\delta-\gamma) \pmatrix{2\\1}
$$
So $C(A)$ is a 2-dimensional space that is spanned by the first 2 rows. Notice how actually only 2 coefficients out of 4 are really needed ...
Looking at the constraints on coefficients
The vector space of the coefficients $\vec x=\pmatrix{\alpha&\beta&\gamma&\delta}^T$ lives in $R^4$ but we should ask ourselves why was it that column 3 was a linear combination of column 1 and 2 ? It is because in all rows , the 3^rd^ coefficient (the $\delta$) is the same linear combination ($\alpha+\beta$) of the coefficients of the first 2 coefficients in the row. If you add another row to $A$ where the 3^rd^ coefficient ($\delta$) is not exactly the sum of the 2 first, then column 3 would no more be a linear combination of column 1 and column 2. Likewise, column 4 is a linear combination of the 2 first column, only because on all rows the 4^th^ coefficient is the opposite of the 2^nd^ coefficient.
So now we have another strictly equivalent way to say that column 3 and column 4 are linear combinations of column 1 and 2 but in term of rows:
All rows $(\alpha,\ \beta,\ \gamma,\ \delta)$ have the same structure $(\alpha,\ \beta,\ \alpha+\beta,\ -\beta)$
The row space
Note that the sum of 2 rows with structure $(\alpha,\ \beta,\ \alpha+\beta,\ -\beta)$ has still this structure. And that multiplying by a scalar also keeps the structure. So we could say that all rows in the $rowspace$ noted $C(A^T)$ have this structure:
$$
C(A^T) = \pmatrix{\alpha\\ \beta\\ \alpha+\beta \\-\beta} \\C(A^T) = \alpha\pmatrix{1\\0\\1\\0} + \beta\pmatrix{0\\1\\1\\-1}
$$
The whole $rowspace$ is generated by 2 vectors and hence has maximum dimension 2. In particular,$row\ 1 = 1\cdot(1,0,1,0) + 2 \cdot (0,1,1,-1)$ and $row\ 2 = 0\cdot(1,0,1,0) + 1 \cdot (0,1,1,-1)$ . Because of the placement of the zeros (that form an identity matrix) we can see that those 2 vectors are independent.
Sum up
So let's go back to the proof that the $dim(rowspace) = dim(colspace)$ applied to our matrix $A$.
- We start with $dim(colspace)=r=2$ in $A$ .We suppose that it is because column 3 is some linear combination $\gamma(x,y)$ of column 1 and 2, and because column 4 is some linear combination $\delta(x,y)$ of same columns 1 and 2
- Column 3 and 4 are combinations of column 1 and 2 exactly because all rows have the structure $(\alpha,\ \beta,\ \gamma(\alpha,\beta),\ \delta(\alpha,\beta))$ where $\gamma = c_1.\alpha + c_2.\beta$ and $\delta = k_1.\alpha + k_2.\beta$
- The $rowspace$ is precisely the set of all 4-d vectors with structure $(\alpha ,\ \beta , \ c_1.\alpha + c_2.\beta , \ k_1.\alpha + k_2.\beta)$ . which is $\alpha.(1,\ 0 , \ c_1, \ k_1) + \beta.(0,\ 1,\ c_2,\ k_2)$
- Because the $rowspace$ can be generated by $r$ vectors, its maximum dimension is $r=2$ so that $dim(rowspace) \leq r = dim(colspace)$
- Now you can either apply the prove to $A^T$ (to say that $dim(colspace) \leq r = dim(rowspace)$ ) or notice that the 2 $rowspace$ vectors are trivially independent because of the placement of zeros.
It's easy to generalize.
If a collection of columns are linearly independent (respectively
linearly dependent) then they remain so under elementary row
operations. Therefore elementary row operations do not change the largest
set of linearly independent columns.
To see this, notice that a linear dependence relation between some
columns of a matrix $A$ is given by a nonzero column vector $v$ with $Av=0$.
If $E$ is an elementary matrix and $B=EA$ then $Bv=EAv=0$; conversely
if $Bv=0$ then $Av=E^{-1}Bv=0$. Therefore if a set of columns
of $A$ is linearly dependent then the corresponding set of columns
of $B$ is also linearly dependent and vice versa. This means $A$ and $B$
have the same column rank.
Best Answer
You can apply elementary row operations and elementary column operations to bring a matrix $A$ to a matrix that is in both row reduced echelon form and column reduced echelon form. In other words, there exist invertible matrices $P$ and $Q$ (which are products of elementary matrices) such that $$PAQ=E:=\begin{pmatrix}I_k\\&0_{(n-k)\times(n-k)}\end{pmatrix}.$$ As $P$ and $Q$ are invertible, the maximum number of linearly independent rows in $A$ is equal to the maximum number of linearly independent rows in $E$. That is, the row rank of $A$ is equal to the row rank of $E$. Similarly for the column ranks. Now it is evident that the row rank and column rank of $E$ are identical (to $k$). Hence the same holds for $A$.