There is no formula which looks only at the generic point(s) of $V \cap W$; you need to understand the entire sheaf $\mathcal{T}or_j^{\mathcal{O}_X}(\mathcal{O}_V, \mathcal{O}_W)$. It might be worth explaining the $K$-theory perspective on this.
Let $K_0(X)$ be the Grothendieck group of coherent sheaves on $X$. There is a ring structure on $K_0(X)$, where
$$[\mathcal{E}] [\mathcal{F}] = \sum (-1)^j [\mathcal{T}or_j^{\mathcal{O}_X}(\mathcal{E}, \mathcal{F}) ]$$
for any coherent sheaves $\mathcal{E}$ and $\mathcal{F}$. Here I am using $[\mathcal{A}]$ to mean "class of $\mathcal{A}$ in $K_0(X)$", and I am using that $X$ is smooth to guarantee that the sum is finite.
$K_0(X)$ has a descending filtration, $K_0(X) \supseteq K_0(X)_{1} \supseteq K_0(X)_{2} \supseteq \cdots \supseteq (0)$ where $K_0(X)_i$ is spanned by classes of sheaves with support in codimension $i$. This makes $K_0(X)$ into a filtered ring, meaning that
$$K_0(X)_i K_0(X)_j \subseteq K_0(X)_{i+j} \quad (\ast)$$
Containment $(\ast)$ is NOT obvious, and we will return to this point.
Let $gr \ K_0(X)$ be the associated graded ring $\bigoplus_{i \geq 0} K_0(X)_j/K_0(X)_{j+1}$. Then there is a map of graded rings from $gr \ K_0(X)$ to the Chow ring $A^{\bullet}(X)$. This map sends $[\mathcal{O}_V]$ to $[V]$.
So, let $V$ and $W$ live in codimensions $i$ and $j$. We want to compute $[V] [W]$ in $A^{i+j}(X)$. From the above, we see that it would be enough to compute
$$\sum (-1)^j [\mathcal{T}or_j^{\mathcal{O}_X}(\mathcal{O}_V, \mathcal{O}_W) ] \quad (\ast \ast)$$
as an element of $K_0(X)_{i+j}/K_0(X)_{i+j+1}$.
Every summand in $(\ast \ast)$ is supported on $V \cap W$. So, if $V \cap W$ lives in codimension $i+j$, then we can just compute the image of each summand separately in the quotient $K_0(X)_{i+j}/K_0(X)_{i+j+1}$. Working this out gives Serre's formula.
Suppose now that $V \cap W$ has codimension $k$, which is less than $i+j$. Then the individual Tor terms live in $K_0(X)_k$ and plugging into Serre's formula gives the image of $(\ast \ast)$ in $K_0(X)_k/K_0(X)_{k+1}$. But, by containment $(\ast)$, the sum $(\ast \ast)$ actually lives farther down the filtration, in $K_0(X)_{i+j}$. This is why simply plugging into the formula you quote gives $0$.
An example might be useful. Take $X = \mathbb{P}^2$. Then $K_0(X)$ is isomorphic as an additive group to $\mathbb{Z}^3$, and we'll take as a basis the structure sheaf of $X$, the structure sheaf of a line, and the structure sheaf of a point. The filtration is given by
$$(\ast, \ast, \ast) \supseteq (0, \ast, \ast) \supseteq (0,0,\ast) \supseteq (0,0,0)$$
Consider intersecting a line $V$ with itself. $\mathcal{T}or_0$ is the tensor product $\mathcal{O}_V \otimes \mathcal{O}_V$, whose class is $(0,1,0)$. $\mathcal{T}or_1$, is the restriction, to $V$, of the ideal sheaf of $V$. This is $\mathcal{O}_V(-1)$ and, as you can work out, it is $(0,1,-1)$ in the basis I chose. The other Tor terms are all zero.
So the individual Tor terms are $(0,1,0)$ and $(0,1,-1)$, which each live in $K_0(X)_1$ Those leading $1$ terms correspond to the lengths of the Tor modules at the generic point of $V$. In order to compute the intersection multiplicity, you have to see farther down in the filtration, to the element $(0,1,0) - (0,1,-1)$ in $K_0(X)_2$. Indeed, $(0,1,0) - (0,1,-1) = (0,0,1)$, showing that a line in the projective plane intersects itself in the class of a point.
In the case of differential geometry everything reduces to vector spaces. Let $x \in X$.
Then at any point $(x, y) \in X \times X$
$$
T_{(x, y)} X \times X = T_x X \oplus T_y X .
$$
Using this identification the tangent to the diagonal at a point $(x, x)$ is the subspace of $T_{(x, x)} X \times X$ given by
$$
T_{(x, x)} (\Delta) = \lbrace (\xi, \xi ) \mid \xi \in T_x X \rbrace \subset T_{(x, x)} X \times X.
$$
On the other hand the normal is the quotient
$$
(T_{(x, x)} X \times X) / T_{(x, x)} \Delta
$$
and we can identify this with $T_x X$ in at least two slightly different ways. Either
$$
\iota_1 \colon \xi \mapsto (\xi , - \xi) + T_{(x, x)} \Delta
$$
or
$$
\iota_2 \colon \xi \mapsto (-\xi , \xi) + T_{(x, x)} \Delta .
$$
We can of course also identify $T_x X$ and $T_{(x, x)} \Delta $ by
$ \xi \mapsto (\xi, \xi)$.
I guess one explanation for the two identifications of the normal bundle is that there is involution $\tau \colon X \times X \to X \times X $ given by $\tau(x, y) = (y, x)$ which fixes the diagonal pointwise and hence acts trivially on the tangent space to the diagonal. As a result it descends to an action on the normal bundle which interchanges the two identifications $\iota_1$ and $\iota_2$, that is $\tau \circ \iota_1 = \iota_2$
Best Answer
I'd like to expand a bit on the excellent comments of Charles Staats and Donu Arapura. They both suggest understanding the self-intersection number of a curve as the number of fixed points of an infinitesimal deformation of the curve, which is manifestly the degree of the normal bundle when such a deformation exists. Here's a slightly more pedestrian route, which I think has the benefit of being rigorous and almost as intuitive.
Suppose we have two curves in a surface: $$\iota_C: C\hookrightarrow X, \iota_D: D\hookrightarrow X.$$ If $C\cap D$ has dimension zero, the intersection number should manifestly be $$C\cdot D:=\dim\Gamma(C\cap D, \mathcal{O}_{C\cap D})=\dim\Gamma(C, \iota_C^*\mathcal{O}_D)=\dim\Gamma(C, \iota_C^*\mathcal{O}_D(D)).$$ (The twist in the last equality does nothing by our assumption on the dimension of $C\cap D$ --- I've just inserted it to simplify things slightly later on.) We'd like to write this as an Euler characteristic, to make it constant if we vary $C$ or $D$ in a flat family. But this is easy; since $\mathcal{O}_{C\cap D}$ has zero-dimensional support, it has no higher cohomology, so its Euler characteristic equals $C\cdot D$ as defined above. Line bundles are nice (and more importantly, are acyclic with respect to restriction), so we use the short exact sequence $$0\to \mathcal{O}_X\to \mathcal{O}_X(D)\to \mathcal{O}_D(D)\to 0$$ to rewrite this Euler characteristic as $$\chi(\mathcal{O}_X(D)|_C)-\chi(\mathcal{O}_C)=\operatorname{deg}(\mathcal{O}_X(D)|_C).$$
I think this is a reasonably intuitive motivation for the definition of the intersection number. So to fully answer your question, one should give an intuitive reason for why $\mathcal{O}_X(D)|_D$ is $\mathcal{N}_{D/X}.$ Of course, this is just the definition of the normal bundle, but let's motivate the definition. First, why is the conormal bundle $I/I^2$, for $I$ the ideal sheaf of a closed subvariety $V\subset X$? Well, an element of $I/I^2$ is precisely a function on $X$ vanishing at $V$, but ignoring higher-order terms. A section to the normal bundle precisely takes functions $f$ defined in a neighborhood of $Y$ and differentiates them--but the partial derivative should depend only on the first-order part of $f$. So the normal bundle should be precisely $(I/I^2)^\vee$. This is another name for $\mathcal{O}_X(D)|_D.$
I hope that was some reasonable intuition/motivation.