Note that $T$ is surjective since for $a\in\Bbb R$ we have $T(A)=a$ where
$$
A=\begin{bmatrix}a & 0\\ 0 & 0\end{bmatrix}
$$
Of course, this implies $\{1\}$ is a basis for $\DeclareMathOperator{Image}{Image}\Image T$.
The Rank-Nullity theorem states
$$
\dim\ker T+\dim\Image T=\dim M_{2\times 2}
$$
Since $\Image T=\Bbb R$ and since
\begin{align*}
\dim\Bbb R &= 1 & \dim M_{2\times 2}&=4
\end{align*}
it follows that
$$
\dim\ker T=4-1=3
$$
So, to find a basis for $\ker T$, it suffices to find three linearly independent matrices in the kernel of $T$. But it can easily be checked that
\begin{align*}
\begin{bmatrix}
1 & 0 \\ 0 & -1
\end{bmatrix}
&&
\begin{bmatrix}
0 & 1 \\ 0 & 0
\end{bmatrix}
&&
\begin{bmatrix}
0 & 0 \\ 1 & 0
\end{bmatrix}
\end{align*}
are three such matrices.
Instead of thinking of "the" basis of the kernel, you need to think of "a" basis of the kernel.
The kernel is a subspace of the domain. In general, it doesn't have only one basis; it has many.
For example, consider $T:\mathbb R^3 \to \mathbb R$ given by $T(x,y,z) = x+2y+3z.$ The kernel is the set of all points $(x,y,z)$ for which $x+2y+3z=0.$ If you pick $y$ and $z$ to be any numbers at all and then let $x = -2y-3z,$ then the resulting point $(x,y,z)$ is a member of the kernel of $T.$ The kernel contains infinitely many points because there are infinitely many values of $y$ and $z$ that you could have chosen.
Every basis of the kernel contains only two points, whereas the kernel itself contains infinitely many.
One basis of the kernel is this:
$$
\{ (-2, 1, 0),\ (-3,0,1) \}.
$$
The first of these points corresponds to the choice $y=1,$ $z=0.$ The second corresponds to $y=0$, $z=1.$
This is a basis for the kernel because every member of the kernel is a linear combination of these two vectors, and this set of two vectors is linearly independent.
Here is another basis of the kernel:
$$
\{(2,-1,0),\ (2,0,-1)\}.
$$
There are infinitely many different bases of the kernel, and each of them is a finite set, containing only two elements.
There is only one kernel, and it is an infinite set.
Best Answer
The definitions are there to highlight sets that are important to understanding the properties of the linear transformation T. Since $T:V\rightarrow W$ the kernel of T is every element of $V$ that T transforms into $0$. The range of T is every element of $W$ that is a transformation of an element of $V$. So, some simple examples:
Let $T:\mathbb{R} \rightarrow \mathbb{R}$ be given by $T(x)=x$. Then Ker$(T)$ = $\{0\}$ (no other element of $\mathbb{R}$ is zero and T is the identity map) and Range$(T)$ is $\mathbb{R}$ because every element of $\mathbb{R}$ is used up by T.
Let $T:\mathbb{R}^2 \rightarrow \mathbb{R}^2$ be given by $T(x,y) = (x+y, x-y)$. The kernel here is all elements of $\mathbb{R}^2$ that map to $(0,0)$ under T. This means solving the simultaneous equations $x+y=0$ and $x-y=0$ and you can see that $(0,0)$ is the only solution. So Ker$(T)=\{(0,0)\}$. Range$(T)$ is $\mathbb{R}^2$ again, because if you pick any target point $(\alpha, \beta)$ and solve the simultaneous equations $x+y=\alpha$ and $x-y=\beta$ then you find $x=(1/2)(\alpha+\beta)$ and $y={1/2}(\beta-\alpha)$ , i.e. there is a value (x,y) that T turns into $(\alpha, \beta)$.
What is the theorem telling you? It's telling you that these sets have structure; they're not just random collections of points. In both the above examples the Kernel consists of the origin, and so is a 0-dimensional subspace. If we had an example where the kernel was bigger, it would have to have at least 1 dimension (subspaces have integer dimensions), so it would be a line (or plane, or hyperplane as the number of dimensions increase). In other words, all the elements that T map to zero are related to each other: you can find the line that T maps to zero (that's what the kernel gives you).
As an example here, consider $T:\mathbb{R}^2 \rightarrow \mathbb{R}^2$ given by $T(x,y) = (x-y, 0)$. The kernel of T is now $\{(x,y) \in \mathbb{R}^2 : x=y\}$. This is a line in $\mathbb{R}^2$, and T maps any point on it to $(0,0)$.
The range also has structure in the same way (but you expect this because T has structure and T defines the range).
Note also that if the kernel of a linear transformation is just the zero element then the transformation must be injective (one-to-one), which is often very useful to know.