Whenever I have troubles with functional derivative things, I just do the replacement of a continuous variable $x$ into a discrete index $i$. If I'm not mistaken this is what they call a "DeWitt notation".
The hand waiving idea is that you can think of a functional $F[f(x)]$ as of a "ordinary function" of many variables $F(f_{-N},\cdots,f_0,f_1,f_2,\cdots,f_N) = F(\vec{f})$ with $N$ going to "continuous infinity".
In that language your functional derivative transforms into partial derivative over one of the variables:
$$\frac{\delta F}{\delta f(x)} \to \frac{\partial F}{\partial f_i}$$
And the delta-function is just an ordinary Kronecker delta:
$$\delta(x-y) \to \delta_{ij}$$
So, gathering this up we for your expression:
$$\frac{\delta F}{\delta f(x)} = \lim_{\epsilon\to\infty}\frac{F[f(x)+\epsilon\delta(x-y)]-F[f(x)]}{\epsilon} \to$$
$$\frac{\partial F}{\partial f_j} = \lim_{\epsilon\to\infty}\frac{F[f_i+\epsilon\delta_{ij}]-F[f_i]}{\epsilon} $$
Which is, to my taste, a bit redundant. But true.
1) As OP basically notes, an $n$-dimensional delta function transforms under change of variables $f:\mathbb{R}^n \to \mathbb{R}^n$ with (the absolute value of) an inverse Jacobian
$$ \tag{1} \delta^n(f(x))~=~ \sum_{x_{(0)},f(x_{(0)})=0 }\frac{1}{|\det(\partial f(x_{(0)}))|} \delta^n(x-x_{(0)}), $$
where the sum $\sum$ is over all zeroes $x_{(0)}$ of $f$, i.e. points $x_{(0)}$ where the function $f(x_{(0)})=0$ vanishes.
2) A typical situation arises when one reparametrizes $m$ constraints
$$ \tag{2} \chi^{a}(x) ~\longrightarrow~ \chi^{\prime a}(x)~=~ \sum_{b=1}^{m} R^{a}{}_{b}(x) \chi^{b}(x) $$
in an $n$-dimensional space (say $\mathbb{R^n}$ for simplicity) with $m\leq n$. Here both sets of constraints
$$\tag{3} \chi^{1}(x)~=~0, \ldots, \chi^{m}(x)~=~0, \qquad \text{and} \qquad \chi^{\prime 1}(x)~=~0, \ldots, \chi^{\prime m}(x)~=~0,$$
are supposed to describe the same $n-m$ dimensional zero-locus set $\Sigma\subseteq\mathbb{R^n}$. Technically, let us also impose that the quadratic matrix $R^{a}{}_{b}$, and both the rectangular matrices $\partial_i \chi^{a}$ and $\partial_i \chi^{\prime a}$, have maximal rank $m$ on the zero-locus set $\Sigma$. Then the delta function of constraints transforms with an inverse Jacobian
$$ \tag{4} \delta^m(\chi^{\prime }(x))
~=~ \frac{1}{|\det(R(x))|}\delta^m(\chi(x)). $$
3) In all branches of physics it is important to keep track of such Jacobian factors.
E.g. in the path integral formulation of gauge theories, where one imposes gauge-fixing conditions $\chi$ (also known as choice of gauge) via a delta function $\delta(\chi)$, it is important to insert a compensating Faddeev-Popov determinant (that transforms with the opposite Jacobian factor under reparametrizations of the gauge-fixing conditions), to keep the path integral independent of $\chi$.
4) As far as we can tell Ref. 1 is careful to keep the same convention throughout the book. Rather than speculate what is the most natural convention, it is more important to be consistent. Concretely, Ref. 1 is discussing volume elements for distributions of the form
$$\tag{5} d\tau~=~dV d\Gamma, \qquad dV~:=~dx~dy~dz,$$
where $d\Gamma$ denote additional variables.
For a monoatomic gas, one may choose $d\Gamma=d^3p$, but one could also choose $d\Gamma=d^3v=m^{-3} d^3p$, where ${\bf p}=m{\bf v}$.
In the approximation that Ref. 1 works in, for a diatomic molecule, there is no moment of inertia (and hence no angular momentum component ${\bf M}\cdot {\bf n}=0$) in the direction ${\bf n}$ of the (symmetry) axis of the diatomic molecule. Here $|{\bf n}|=1$. Hence it is natural to choose
$$\tag{6} d\Gamma~=~d^3p~d^3M \iint_{{\bf n}}\! d^2n~\delta({\bf M}\cdot {\bf n}). $$
With spherical decomposition of ${\bf M}=M{\bf m}$, where ${\bf m}\cdot {\bf n}=\cos\theta$, this becomes
$$ d\Gamma~=~d^3p~M^2dM~d^2m \iint_{{\bf n}}\! d^2n ~\delta(M\cos\theta)$$
$$~=~d^3p~MdM~d^2m \int_{0}^{2\pi}\! d\varphi \int_{-1}^{1}\!d\cos\theta ~\delta(\cos\theta)$$
$$\tag{7}~=~2\pi ~d^3p~MdM~d^2m, $$
which is eq. (1.1) in Ref. 1.
References:
- Pitaevskii and Lifshitz, Physical Kinetics.
Best Answer
It looks like a delta-function. However, because $\epsilon / (\epsilon^2+t^2)$ - you should omit one $\epsilon$ in the numerator, to get the right integral equal to one, by the way - decreases too slowly as $|t|\to\infty$, as $1/t^2$, it will only work as the Dirac delta distribution for test functions that decrease at infinity or at least increase slower than as $O(t)$. If the test function is $t^2$, for example, the integral $$ \int_{-\infty}^\infty dt\,t^2\,\delta(t) $$ should yield 0 because $t^2=0$ for $t=0$. However, with your definition of the delta function, you will get a divergent answer because the infinite-range integral ultimately beats any $\epsilon$. For this reason, one usually wants approximations of delta functions that decrease faster at $|t|\to\infty$ than the Lorentzian.
Obviously, if you include one extra $\epsilon$, you get $\epsilon\cdot \delta(t) = 0$ regardless of details about the $|t|\to\infty$ behavior.