See THIS ANSWER, where I provided a Primer on the Dirac Delta.
The heuristic statement $\delta(x)(f(x)-f(0))=0$ means that for each test function $f$, the functional $D[f(x)-f(0)]=0$, where $D[\cdot]$ is the Dirac Delta functional.
We write the functional for $D$ formally as
$$D[\cdot]=\int_{-\infty}^{\infty}\delta(x)[\cdot]dx \tag 1$$
But the right-hand side of $(1)$ is not an integral. Rather, it shares many of the same properties with integrals and is therefore useful notation. But it is only notation.
So, for a test function $f(x)$, we have
$$D[f(x)]=f(0)$$
and therefore
$$D[f(x)-f(0)]=f(0)-f(0)=0\tag 2$$
Finally, we interpret $(2)$ formally and write
$$\delta(x)(f(x)-f(0))=0$$
Text books that heuristically discuss the Dirac Delta will often give the curiously nonsensical point-wise definition of $\delta(x)$
$$\delta(x)=
\begin{cases}
0,&x\ne 0\\\\
\infty,&x=0
\end{cases}
$$
which obviously is meaningless even with the additional condition that $\int_{-\infty}^{\infty}\delta(x)\,dx=1$.
This "hand-waving" description can be made rigorous by defining a family of functions $\delta_n(x)$ with the properties that
$$\lim_{n\to \infty}\delta_n(x)=
\begin{cases}
0,&x\ne 0\\\\
\infty,&x=0
\end{cases}
$$
and
$$\lim_{n\to \infty}\int_{-\infty}^{\infty}\delta_n(x)\,dx=1 \tag 3$$
One may then write, $\delta(x)\sim \lim_{n\to \infty}\delta_n(x)$ with the interpretation provided by $(3)$. Examples of such families of functions include the pulse function
$$\delta_n(x)=
\begin{cases}
n/2,&-\frac{1}{n}\le x\le \frac{1}{n}\\\\
0,&\text{otherwise}
\end{cases}
$$
and the Gaussian function
$$\delta_n(x)=\frac{n}{\sqrt{\pi}}e^{-n^2x^2}$$
In this answer here, I discussed the regularization used in potential theory for the $\mathscr{R}^3$ Dirac Delta $\delta(\vec r)$. There, the Dirac Delta is written
$$\begin{align}
\delta(\vec r)&\sim \lim_{a\to 0}\delta_{a}(\vec r)\\\\
&=\lim_{a\to 0} \frac{3a^2}{4\pi(r^2+a^2)^{5/2}}
\end{align}$$
where $\lim_{a\to 0}\int_{\mathscr{R}^3}f(\vec r)\,\delta_{a}(\vec r)\,dV=f(0)$.
And finally in this answer here, I analyze the family of functions $\delta_{\epsilon}(x)=\frac{1}{\sqrt{\pi\,\epsilon}}e^{-\tan^2(x)/\epsilon}$ that describes the "train" of Dirac Deltas
$$\sum_{\ell =-\infty}^{\infty}\delta(x-\ell \pi)\sim \lim_{\epsilon \to 0}\frac{1}{\sqrt{\pi\,\epsilon}}e^{-\tan^2(x)/\epsilon}$$
Best Answer
The delta function comes due to the non-differentiability of the absolute value function at the point $0$. In that case, a delta function (centered at zero) gets added. Furthermore, the coefficient of the delta function is the "jump" of the function at the point i.e. the right limit minus the left limit at the point. In this case, it is $-2a$, hence we see that the factor $-2a\delta(x)$ gets added to the derivative.
More formally:
Given that the function is continuously differentiable except at a point $p$, then: $$ Df(\phi) = f'(\phi) + J_f(p)* \delta_{p}(\phi) $$
Where $\phi$ is a $C^\infty$ function with compact support, and $Df$ is the distributional derivative, and $f'$ is the classical derivative treated as a distribution, and $J_f(p)$ is the jump at $p$ which I described in the case earlier.
The proof of this is a little involved (rigorously, that is), but not difficult.
EDIT:
As Did pointed out, your logic prior to this is wrong, because if your function is not continuous, then it can't be differentiable. However, there is a concept of weak derivative, and I think you should read this up before you actually understand the terms of this answer (distributions, the topic is called). Furthermore, the delta function, well what is it? It's a a distribution, not a function, and when I answered this question I thought you had this background. Besides, do read up, because the subject is fascinating.