The core of studying matrices is to study linear transformations between vector spaces. These can be realized as matrix multiplication on the left (or right) of column (or row) vectors.
If we are in this setup: $x\mapsto Ax$ for a column vector $x$ and appropriate matrix $A$, then the image of the linear transformation will be spanned by the columns of $A$.
The kernel of the transformation (nullspace) is the set of all $x$ such that $Ax=0$ is important for understanding the solutions to some matrix equations. You probably have already learned that if $x_0$ is a solution to $Ax=b$, then every other solution is given by $x_0+k$ where $k$ is in the nullspace.
This all has analogous explanation on the other side. If we are in this setup: $x\mapsto xA$ for a row vector $x$, then the image of the linear transformation is now spanned by the rows of $A$.
Talking about the nullspace of $A^T$ is just a fancy way of dressing up the "left nullspace" of $A$, since $xA=0$ iff $A^T x^T=0$. The nullspace is now the set of all $x$ such that $xA=0$, and you can draw the same conclusions about solutions to $xA=b$.
In short, these four spaces (really just two spaces, with a left and a right version of the pair) carry all the information about the image and kernel of the linear transformation that $A$ is affecting, whether you are using it on the right or on the left.
So in general, suppose we have a linear transformation $\tau:W \rightarrow V$ over some field $\mathbb{F}$ with $\dim(W)=n$ and dim$(V)=m$.
The tricky thing is that vectors in the kernel are in $W$ and vectors in the image is in $V$. In the proof of the rank nullity theorem we make the "connection" via another theorem stating that the image of a basis spans the image...so if $\beta$ is a basis for $W$, $\tau(\beta)$ spans $\tau(W)$ - if you do not understand this, then it's best to first study that theorem.
Now start the proof by taking a basis $\alpha=\{\alpha_1 \ldots \alpha_k\}$ for the kernel of $\tau$, which is a subspace of $W$. We can always extend this to be a basis for $W$ itself (another theorem)...so let's extend it by adding vectors $\beta=\{\alpha_{k+1},\ldots,\alpha_n\}$ so that $\alpha \cup \beta$ is a basis for $W$. So as mentioned in the previous paragraph, $\tau(\alpha \cup \beta)$ spans $\tau(W)$. But we can in fact do better, since we know $\tau(\alpha_1)=\cdots=\tau(\alpha_k)=0$, we in fact have that $\tau(\beta)$ spans $\tau(W)$. This is the first part of the proof.
What we have to prove now is that $\tau(\beta)$ is a linearly independent set. We start in the usual way, from the definition of linear independence, suppose we have scalars $a_{k+1},\ldots,a_n$ so that \begin{equation} \sum_{i=k+1}^{n}a_i \tau(\alpha_i)=0, \end{equation} then by the linearity of $\tau$:\begin{equation} \tau\left(\sum_{i=k+1}^{n} a_i \alpha_i \right )=0. \end{equation} But this means $\sum_{i=k+1}^{n} a_i \alpha_i $ is in the kernel of $\tau$. Let $v=\sum_{i=k+1}^{n} a_i \alpha_i $. Since $v$ is in the kernel of $\tau$ and $\alpha$ is a basis for this kernel we can find scalars $a_1,\ldots,a_k$ so that $v=\sum_{i=1}^{k} a_i \alpha_i$. And now we use a simple "trick": \begin{equation} v-v= \sum_{i=1}^{k} a_i \alpha_i-\sum_{i=k+1}^{n} a_i \alpha_i=0,\end{equation} and since $\alpha \cup \beta$ is a basis, and hence linearly independent we must have $a_1=\cdots=a_n=0$, which proves that $\tau(\beta)$ is linearly independent.
So taking it all together: $\{\alpha_1,\ldots,\alpha_k\}$ is a basis for the kernel of $\tau$ (by hypothesis), and we have proved $\{\tau(\alpha_{k+1}),\ldots,\tau(\alpha_{n})\}$ is a basis for the image of $\tau$, and we have $|\alpha_1,\ldots,\alpha_k|+|\tau(\alpha_{k+1}),\ldots,\tau(\alpha_{n})|=k+(n-k)=n$, which is what we wanted...
let me know if anything is unclear, and I will try to expand on whatever specific request you might have.
Edit - Request to prove first theorem:
Suppose $\beta=\{\beta_1,\ldots,\beta_n\}$ is a basis for $W$. Consider any vector $w \in W$ which can be expressed in terms of the basis $\beta$ as \begin{equation} w=\sum_{i=1}^n b_i \beta_i\end{equation} for some scalars $b_i$, then $\tau(w) \in \tau(W)$ and we have \begin{eqnarray} \tau(w) &=& \tau\left(\sum_{i=1}^n b_i \beta_i \right) \\
&=& \sum_{i=1}^n b_i\tau(\beta_i), \end{eqnarray} by the linearity of $\tau$. This means any vector in $\tau(W)$ can be expressed as a linear combination of the vectors in $\tau(\beta)$, and therefore $\tau(\beta)$ spans $\tau(W)$.
Best Answer
My favourite proof goes along the lines of