While chaohuang's answer gave you basically everything you needed, I'd like to elaborate a little on the last part, and on generalized functions (a.k.a. distributions) in general; I apologize for the unavoidable pun.
I realize that in physics, you might treat generalized functions, like the delta distribution, in a "classical" way similar to regular functions. The truth is, however, that distributions are really only defined in the context of how they act on a set of (nicely behaved) test functions. These test functions are usually either Schwartz space functions, or compactly supported functions. Occasionally we also consider the (much larger) test space of $C^\infty$ functions. So the reason why $\hat{H}(\xi)$ involves $\text{PV}\left(\dfrac{1}{\xi}\right)$, as opposed to just $\dfrac{1}{\xi}$, is because of how it integrates against a test function in Schwartz space.
To make
\begin{align}
\hat{H}(\xi) &= \int_{-\infty}^{\infty}H(x)e^{-ix\xi}\;dx\\
&= \lim_{\epsilon\to0^+}\;\int_{0}^{\infty}e^{-\epsilon x}e^{-ix\xi}\;dx\\
&= \lim_{\epsilon\to0^+}\;(\epsilon+i\xi)^{-1}\\
&= \lim_{\epsilon\to0^+}\;-i(\xi-i\epsilon)^{-1}\\
&= -i(\xi-i0)^{-1}
\end{align}
rigorous, you need to see how it acts as a linear functional on Schwartz space:
\begin{align}
\hat{H}[\varphi] &= \int_{-\infty}^{\infty}\hat{H}(\xi)\varphi(\xi)\;d\xi\\
&= \lim_{\epsilon\to0^+}\;\int_{-\infty}^{\infty}-i(\xi-i\epsilon)^{-1}\varphi(\xi)\;d\xi\\
&= -i\lim_{\epsilon\to0^+}\;\int_{-\infty}^{\infty}\left(\frac{\xi+i\epsilon}{\xi^2+\epsilon^2}\right)\varphi(\xi)\;d\xi\\
&= -i\left(\lim_{\epsilon\to0^+}\;\int_{-\infty}^{\infty}\left(\frac{\xi}{\xi^2+\epsilon^2}\right)\varphi(\xi)\;d\xi + i\lim_{\epsilon\to0^+}\;\int_{-\infty}^{\infty}\left(\frac{\epsilon}{\xi^2+\epsilon^2}\right)\varphi(\xi)\;d\xi\right)\\
&=-i\lim_{\epsilon\to0^+}A(\epsilon)+\lim_{\epsilon\to0^+}B(\epsilon)
\end{align}
where
\begin{align}
A(\epsilon) &= \int_{-\infty}^{\infty}\left(\frac{\xi}{\xi^2+\epsilon^2}\right)\varphi(\xi)\;d\xi\\
&= \int_{-\infty}^{\infty}\frac{d}{d\xi}\left(\frac{1}{2}\ln(\xi^2+\epsilon^2)\right)\varphi(\xi)\;d\xi\\
&= -\frac{1}{2}\int_{-\infty}^{\infty}\ln(\xi^2+\epsilon^2)\varphi'(\xi)\;d\xi\\
\end{align}
and
\begin{align}
B(\epsilon) &= \int_{-\infty}^{\infty}\left(\frac{\epsilon}{\xi^2+\epsilon^2}\right)\varphi(\xi)\;d\xi\\
&= \int_{-\infty}^{\infty}\frac{d}{d\xi}\left(\arctan\left(\frac{\xi}{\epsilon}\right)\right)\varphi(\xi)\;d\xi\\
&= -\int_{-\infty}^{\infty}\arctan\left(\frac{\xi}{\epsilon}\right)\varphi'(\xi)\;d\xi\\
\end{align}
with
$$\lim_{\epsilon\to0^+}\;A(\epsilon) = -\int_{-\infty}^{\infty}\ln(|\xi|)\varphi'(\xi)\;d\xi$$
and
$$\lim_{\epsilon\to0^+}\;B(\epsilon) = -\frac{\pi}{2}\int_{-\infty}^{\infty}\text{sgn}(\xi) \varphi'(\xi)\;d\xi$$
by various Lebesgue integration theorems, and the fact that $\varphi$ and its derivatives are rapidly decaying. In particular, whereas the area under $1/\xi$ is infinite to either side of the singularity (making the integration of $\varphi/\xi$ ill-defined around $\xi=0$), the function $\ln|\xi|$ has finite integral near $\xi=0$, and so $\lim_{\epsilon\to0^+}\;A(\epsilon)$ is well-defined.
An easy computation then gives us that
$$\lim_{\epsilon\to0^+}\;A(\epsilon) = \lim_{\epsilon\to0^+}\; \int_{\mathbb{R}\setminus(-\epsilon,\epsilon)}\frac{1}{\xi}\varphi(\xi)\;d\xi =: \text{PV}\left(\frac{1}{\xi}\right)[\varphi]$$
by definition of the Cauchy principal value distribution. Similarly, we find that
$$\lim_{\epsilon\to0^+}\;B(\epsilon) = 2\left(\frac{\pi}{2}\varphi(0)\right):= \pi\delta[\varphi]$$
by definition of the delta distribution.
So in conclusion, we have finally arrived at the fact that
$$\hat{H}(\xi) = -i\text{PV}\left(\frac{1}{\xi}\right) + \pi\delta$$
This is actually a nasty business when one uses the Fourier transform of such things as a constant function on $\mathbf{R}$.
If you want to prove $u'=\delta$ in the distribution sense, you should use the correct way of calculating distribution derivatives as stated by L. Schwartz.
Write $\left\langle f,\varphi\right\rangle$ the value of the distribution $f$ applied to a test function $\varphi\in\mathcal{C}^\infty$ with a compact support. You have
$$\left\langle\delta,\varphi\right\rangle=\varphi(0).\tag 1$$
If the distribution $f$ is a locally integrable function then
$$\left\langle f,\varphi\right\rangle=\int_{\mathbf{R}}f(x)\varphi(x)\,\mathrm{d}x.$$
Note that $\delta$ is not such a function, so the above formula does not apply for $\delta$.
According to your notations, one can compute $\left\langle u,\varphi\right\rangle$:
$$\left\langle u,\varphi\right\rangle=\int_{\mathbf{R}}u(x)\varphi(x)\mathrm{d}x=\int_0^\infty\varphi(x)\mathrm{d}x.$$
To compute $u'$, apply now the definition of the derivative $f'$
$$\left\langle f',\varphi\right\rangle=-\left\langle f,\varphi'\right\rangle$$
to $f=u$. You get
$$\left\langle u',\varphi\right\rangle=-\left\langle u,\varphi'\right\rangle=-\int_0^\infty\varphi'(x)\mathrm{d}x.$$
As $\varphi$ has a compact support, it is equal to $0$ at infinity and the above equation gives
$$\left\langle u',\varphi\right\rangle=-\lim_{x\to\infty}\varphi(x)+\varphi(0)=\varphi(0)$$
which is exactly the definition of the distribution $\delta$ in equation (1). We just have proved that $u'=\delta$.
Best Answer
To complement Dirk's answer:
Your inversion of differentiation can't work like that. There's a family of functions that differ by additive constants and all have the same derivative. Their Fourier transforms differ by deltas at the origin (proportional to the additive constants), so it can't be the case that you get the Fourier transform of all of them by dividing the transform of the derivative by $\mathrm i\omega$.
The catch is that multiplying by $\mathrm i\omega$ multiplies by $0$ at the origin and thus annihilates any delta that might be sitting there, and you can't reconstruct it by dividing by $\mathrm i\omega$. What you get by dividing by $\mathrm i\omega$ (if you interpret the pole appropriately) is the unique function with that derivative with zero average. In your case, that would be a step function that jumps from $-1/2$ to $1/2$.
Here's another way of looking at it: The function
$$H_\alpha(t)=\begin{cases}\mathrm e^{-\alpha t}&t\ge0\\0&t\lt0\end{cases}$$
defined in the note Dirk linked to decays at infinity and converges to the Heaviside function pointwise as $\alpha\to0$. Its Fourier transform is $\hat H(\omega)=1/(\alpha+\mathrm i\omega)$, which converges to $1/(\mathrm i\omega)$ pointwise as $\alpha\to0$, except at $\omega=0$. What you're doing corresponds to directly setting $\alpha$ to zero and using the pointwise limit as the Fourier transform. However, that ignores the fact that while the imaginary part goes to $1/(\mathrm i\omega)$, the real part has a spike at the origin, which gets narrower as $\alpha\to0$ but whose integral is independent of $\alpha$:
$$ \begin{align} \int_{-\infty}^{\infty}\frac1{\alpha+\mathrm i\omega}\mathrm d\omega &=\int_0^{\infty}\left(\frac1{\alpha+\mathrm i\omega}+\frac1{\alpha-\mathrm i\omega}\right)\mathrm d\omega \\ &=\int_0^{\infty}\frac{2\alpha}{\alpha^2+\omega^2}\mathrm d\omega \\ &=2\left[\arctan\frac \omega\alpha\right]_0^{\infty} \\ &=\pi\;. \end{align} $$
It's this spike of size $\pi$ that you're dropping when you use the pointwise limit, which has average $0$ (if you interpret the pole appropriately).