To distinguish between points and tangent vectors, let $p=(p_1,...,p_n)\in \mathbb{R}^n$ a point of $\mathbb{R}^n$ and $v=(v_1,...,v_n)\in T_p(\mathbb{R}^n)$ a point of the tangent space $\mathbb{R}^n$.
The line through $p=(p_1,...,p_n)\in \mathbb{R}^n$
with direction $v=(v_1,...,v_n)\in T_p(\mathbb{R}^n)$ has parametrization $a(t)=(p_1+tv_1,...,p_n+tv_n)$.
If f is $C^\infty$ in a neighborhood of $p\in \mathbb{R}^n$ and $v$ is a tangent vector at $p$, define the directional derivative of $f$ in the direction of $v$ at $p$ as
$$D_vf=\lim\limits_{t \to 0} \frac{f(a(t))-f(p)}{t}.$$
By the multi-variable chain rule, we have $$D_vf=\sum_{i=1}^{n} \frac{da^i}{dt}(0)\frac{\partial f}{\partial x^i}(p)=\sum_{i=1}^{n} v_i\frac{\partial f}{\partial x^i}(p).$$
Of course in the above notation $D_vf$, the partial derivatives are evaluated at $p$, since v is a vector at $p$. Now, we can define a map $D_v$ (which assigns to every
f which is $C^\infty$ the real number $D_v(f)$ ) with the natural way
$$D_v=\sum_{i=1}^{n} v_i\frac{\partial }{\partial x^i}(p)=\sum_{i=1}^{n} v_i\frac{\partial }{\partial x^i}\Bigr\rvert_{p}.$$
This map $D_v\in \mathcal{D}_p(\mathbb{R}^n)$ is in fact a derivation at $p$.
Finally, you can show that the map
\begin{align}
\phi :T_p(\mathbb{R}^n) &\to \mathcal{D}_p(\mathbb{R}^n) \\
v &\to D_v
\end{align}
is a linear isomorphism of vector spaces (for surjectivity, you can use Taylor's theorem).
So the answers to your questions are:
3) Υes, you can see every tangent vector $v \in T_p(\mathbb{R}^n)$ as a derivation $D_v\in \mathcal{D}_p(\mathbb{R}^n)$ using the isomorphism $\phi$.
2) Since $e_1,...,e_n$ is the canonical basis of $T_p(\mathbb{R}^n)$ and $\phi$ is an isomorphism, then $\phi(e_1),...,\phi(e_n)$ is a basis of $D_v\in \mathcal{D}_p(\mathbb{R}^n)$. But $\phi(e_i)=\frac{\partial }{\partial x^i}\Bigr\rvert_{p}$, hence $\{\frac{\partial }{\partial x^i}\Bigr\rvert_{p}\}_{i=1}^n$
is a basis of the tangent space
$\mathcal{D}_p(\mathbb{R}^n)\simeq T_p(\mathbb{R}^n)$.
1) As a result, you can say that the basis of $T_p(\mathbb{R}^n)$ is $\{\frac{\partial }{\partial x^i}\Bigr\rvert_{p}\}_{i=1}^n$. You can say all that because of $\phi$.
To begin, the span of a set of vectors is just the set of linear combinations of them. It's also the smallest vector (sub-)space that contains all of the vectors in question. A set of vectors spanning W need not be linearly independent (for example, using the vectors you defined in the question, $u, v$ and $u+v$ together also span $W$). Linearly independent sets of vectors that span a subspace form a basis for it, that for the most part, behaves very similarly to the standard basis for, say, $\mathbb{R}^n$: you can express any vector in the subspace in question as a unique linear combination of the basis vectors.
Essentially, selecting a basis is like selecting coordinates for the vector space. The axes you choose don't have to be perpendicular to one another, but they still allow you to uniquely represent every point in your space. It's very convenient if the axes for your coordinate system are also perpendicular to one another, and even more so if your basis vectors also have length 1, because then things look exactly like euclidean spaces... but that's not necessary for the question at hand (it's just to maybe help you develop some intuition).
There's a lot of freedom to choose a basis for a vector space. For example, in $\mathbb{R}^2$, any two non-zero vectors that are not parallel constitute a basis. Similarly, in your example, you could choose many other basis sets, but the ones you chose are perfectly fine!
I think the piece you're missing to show that the vectors you chose, $u$ and $v$, form a basis is just the statement that $\dim W = 2$ (if $\dim W=n$, you need $n$ linearly independent vectors to span it). Here's how the logic plays out: Since $u$ and $v$ are two vectors in $W$, they span a subspace of $W$ (i.e., ${\rm span} \{u,v\}\subset W$). Since they are two linearly independent vectors, the dimension of this subspace is 2. Lastly, because $\dim W=2$, the only subspace of $W$ that has dimension 2 is $W$ itself, so ${\rm span}\{u,v\} = W$.
An elementary way to see that $\dim W = 2$ is to note that $\dim W \ge2$ because you found two linearly independent vectors in it (namely, $u$ and $v$), but that there are vectors in $\mathbb{R}^3$ that are not in $W$ (e.g., $(1,0,0)$), so $\dim W < \dim \mathbb{R}^3 = 3$. This implies $\dim W = 2$.
Another, more involved, way to see that $\dim W =2$ is to use the rank-nullity theorem. To do that, we want to build a linear map $T: U\to V$ such that its kernel is $W$. $W$ is a subspace of $\mathbb{R}^3$, so the map will go from $U = \mathbb{R}^3$. Here's a A simple one with $V=\mathbb{R}$:
\begin{equation}
T\begin{pmatrix}
x \\
y \\
z
\end{pmatrix}
= x + y + z\,.
\end{equation}
By the very definition of $W$, $\ker T = W$. Because $T$ is not the zero map, its rank must be no smaller than 1, and it cannot be greater than $\dim V = 1$, so ${\rm rank\,} T =1$. The rank-nullity theorem tells us that ${\rm rank\,} T + \dim \ker T = \dim U = 3$, so $\dim W = \dim \ker T=2$.
Hope this helps!
Best Answer
Perhaps you can just use the definition of kernel. A vector in $\mathbb R^4$ is a combination $X = a\partial_x + b \partial_y + c\partial_z + d\partial_w$. Then $X$ sits in the kernel of $\alpha$ if $\alpha(X) = 0$, which gives $$8a-4b+2c=0, \quad d \in \mathbb R.$$ Therefore $c=2b-4a$ and $X=a\partial_x + b\partial_y + (2b-4a)\partial_z+d\partial_w$. So the kernel of $\alpha$ is given by the span of $\partial_x - 4\partial_z, \partial_y + 2\partial_z, \partial_w$, and is thus three-dimensional.