The gradient of $\frac1r$ (noting that $r=\sqrt{x^2+y^2+z^2}$) is
$$
\nabla \frac1r = -\frac{\mathbf{r}}{r^3}
$$
when $r\neq 0$, where $\mathbf{r}=x\mathbf{i}+y\mathbf{j}+z\mathbf{k}$. Now, the divergence of this is
$$
\nabla\cdot \left(-\frac{\mathbf{r}}{r^3}\right) = 0
$$
when $r\neq 0$. Therefore, for all points for which $r\neq 0$,
$$
\nabla^2\frac1r = 0
$$
However, if we integrate this function over a sphere, $S$, of radius $a$, then, applying Gauss's Theorem, we get
$$
\iiint_S \nabla^2\frac1rdV = \iint_{\Delta S} -\frac{\mathbf{r}}{r^3}.d\mathbf{S}
$$
where $\Delta S$ is the surface of the sphere, and is outward-facing. Now, $d\mathbf{S}=\mathbf{\hat r}dA$, where $dA=r^2\sin\theta d\phi d\theta$. Therefore, we may write our surface integral as
$$\begin{align}
\iint_{\Delta S} -\frac{\mathbf{r}}{r^3}.d\mathbf{S}&=-\int_0^\pi\int_0^{2\pi}\frac{r}{r^3}r^2\sin\theta d\phi d\theta\\
&=-\int_0^\pi\sin\theta d\theta\int_0^{2\pi}d\phi\\
&= -2\cdot 2\pi = -4\pi
\end{align}$$
Therefore, the value of the laplacian is zero everywhere except zero, and the integral over any volume containing the origin is equal to $-4\pi$. Therefore, the laplacian is equal to $-4\pi \delta(\mathbf{r})$.
EDIT: The general case is then obtained by replacing $r=|\mathbf{r}|$ with $s=|\mathbf{r}-\mathbf{r_0}|$, in which case the function shifts to $-4\pi \delta(\mathbf{r}-\mathbf{r_0})$
In essence, it's the difference between a probability density function (pdf) and a probability. For example, if you consider a standard normal random variable $X$ with pdf $p$, we have $\mathbb{P}(X=0)=0$, but $p(0)=(2\pi)^{-1/2}$. In your example, you have $\mathbb{P}_{\delta_0}(X=0)=1$ but the pdf takes the value $\infty$ at $0$.
Best Answer
To understand this properly, I suggest to look at distributions and how operations with distributions are defined in the first place.
A distribution is an object that acts on the space of infinitely differentiable and compactly supported functions in a linear and continuous way (check a textbook or Wikipedia for the precise definition). I.e. a distribution $T$ on a set $\Omega\subset\newcommand{\RR}{\mathbb{R}}\RR^n$assigns to any infinitely differentiable and compactly supported function $\phi$ defined on $\Omega$ a complex number $T(\phi)$. Then one notes that locally integrable function $f$ defined on $\Omega$ induced a distribution $T_f$ via the operation $$T_f(\phi) = \int_\Omega f(x)\phi(x)dx.$$ Now one can try to define operations which one can do to a locally integrable function also for a distribution by analogy. Take, for example, translation (in the case $\Omega=\RR^n$): define the operation $t_y(f)$ defined by $t_y(f)(x) = f(x-y)$. Observe that $$T_{t_y(f)}(\phi) = \int t_y(f)(x)\phi(x)dx = \int f(x-y)\phi(x)dx = \int f(x)\phi(x+y)dx = T_f(t_{-y}(\phi)).$$ I.e. "translating the function $f$ is the same as translation the test function $\phi$ in the opposite direction". Hence, one defines the translation of the distribution $T$ as $$t_y(T)(\phi) := T(t_{-y}\phi).$$ (Work out, that the translation of the Dirac $\delta$ is what you think it should be.) Now you should be able to do the same thing with scaling $s_a(f)(x) = f(ax)$.