In general, weak derivatives are distributions, i.e., linear functionals on a space of smooth, compactly supported test functions. A (locally integrable) function $f$ is identified with the distribution $\phi \mapsto \int \phi f$. So two almost everywhere equal functions define the same distribution, and are the same as weak derivatives. This also means that questions about the value of weak derivatives at a certain point don't have well-defined answers, unless you have continuous versions.
In your example, the weak derivative is the function which is $0$ on $(-\infty,0)$ and $1$ on $(0,\infty)$, and you can give it any value at $0$, or even modify it on any set of measure $0$. Now if you want to differentiate this derivative again, then its weak derivative is not representable by a function anymore, due to the jump discontinuity at $0$. In this case, you would get the "Dirac $\delta$-function" which really is not a function, but the distribution associated to the measure $\mu$ with mass $1$ on $0$ and mass $0$ everywhere else. (The distribution associated to a measure $\mu$ is the functional $\phi \mapsto \int \phi \, d\mu$, in this case $\phi \mapsto \phi(0)$.) Now you can even differentiate this Dirac $\delta$ one more time, in which case you get a distribution which is not even representable by a measure anymore. It would be the functional $\phi \mapsto \phi'(0)$, the third derivative of your original function in the sense of distributions.
If a function is differentiable everywhere with locally integrable derivative, then this derivative is also the weak derivative in the sense of distributions. However, note that there are differentiable functions whose derivative is not locally integrable, and that there are continuous almost everywhere differentiable functions with derivatives that are not their weak derivatives. (The classical example here is the "devil's staircase", whose derivative is almost everywhere $0$, and whose weak derivative is a measure on the Cantor set.)
Useful fact: a function with locally integrable weak derivative (i.e., a function of Sobolev class $W^{1,1}_{\rm loc}$) is approximately differentiable almost everywhere, and the approximate derivative agrees with the weak derivative. This is stated on page 8 here with a reference to the book by L. Ambrosio, N. Fusco, D.Pallara, Functions of bounded variation and free discontinuity problems. Another reference is the book by L.C. Evans and R.F. Gariepy Measure theory and fine properties of functions.
If a function has strong (pointwise) derivative at $x$, then this derivative is also the approximate derivative. Hence, if a function is $W^{1,1}_{\rm loc}$ and differentiable a.e., then its weak derivative is represented by the pointwise derivative.
Best Answer
If $p>n$, then the function is differentiable a.e. and the derivative coincides with the weak derivative a.e.
If $p\leq n$ (or even $f\in BV$) the function is only approximately differentiable a.e.
Both results can be found in Evans & Gariepy, Measure theory and fine property of functions, section 6.1 (and if I recall correctly "a.e." can be replaced by "outside a set of zero $p$-capacity", which is slightly stronger).
To construct a counterexample to a.e. differentiability for $p$ strictly below $n$, consider a nonnegative function $\eta \in C^\infty (\mathbb{R}^n)$ with support in $B_1$ and with value $1$ on $B_{1/2}$, and enumerate the rationals as $\mathbb{Q}=\{q_i\}_{i\in \mathbb{N}}$. Choose a sequence $r_i\searrow 0$ to be specified later, and define $$f(x)=\sum_{i\in \mathbb{N}}\eta\left(\frac{x-q_i}{r_i}\right).$$ This is a dense sum of bumps with smaller and smaller support. By the scaling of the $L^p$ norms you can check that indeed $f\in W^{1,p}$, provided $\sum_{i\in \mathbb{N}}r_i^{\frac{n}{p}-1}<\infty$.
However, the support of $f$ is contained in $\bigcup_{i\in \mathbb{N}} B(q_i,r_i)$ which can be made as small as wanted by sending $r_i$ quickly to zero, therefore at most points the function is zero. On the other hand, $f$ has value at least $1$ on a dense set (and the same holds for any function in the same equivalence class), therefore it can not be differentiable where it attains value zero.
I couldn't come up with a similar counterexample for $f\in W^{1,n}(\mathbb{R}^n)$, but maybe a similar construction would work.