To understand this properly, I suggest to look at distributions and how operations with distributions are defined in the first place.
A distribution is an object that acts on the space of infinitely differentiable and compactly supported functions in a linear and continuous way (check a textbook or Wikipedia for the precise definition). I.e. a distribution $T$ on a set $\Omega\subset\newcommand{\RR}{\mathbb{R}}\RR^n$assigns to any infinitely differentiable and compactly supported function $\phi$ defined on $\Omega$ a complex number $T(\phi)$. Then one notes that locally integrable function $f$ defined on $\Omega$ induced a distribution $T_f$ via the operation
$$T_f(\phi) = \int_\Omega f(x)\phi(x)dx.$$
Now one can try to define operations which one can do to a locally integrable function also for a distribution by analogy. Take, for example, translation (in the case $\Omega=\RR^n$): define the operation $t_y(f)$ defined by $t_y(f)(x) = f(x-y)$. Observe that
$$T_{t_y(f)}(\phi) = \int t_y(f)(x)\phi(x)dx = \int f(x-y)\phi(x)dx = \int f(x)\phi(x+y)dx = T_f(t_{-y}(\phi)).$$
I.e. "translating the function $f$ is the same as translation the test function $\phi$ in the opposite direction". Hence, one defines the translation of the distribution $T$ as
$$t_y(T)(\phi) := T(t_{-y}\phi).$$
(Work out, that the translation of the Dirac $\delta$ is what you think it should be.) Now you should be able to do the same thing with scaling $s_a(f)(x) = f(ax)$.
Keep in mind that the left-side $$\int_{-\infty}^{\infty} f(x) \delta(x - a) dx$$ is not an inner product of two functions, since the Dirac delta function isn't a function. It's properly understood in the context of distribution theory, or regarded as a generalized function. That being said, you're right: The utility isn't in computing the value at $a$, but rather in knowing that the integral can be simplified.
Another nice reason to study the delta function is that it can be regarded (again, in the sense of distributions) as the derivative of the Heaviside function.
As an example of this that comes up in differential equations, suppose you have a mass hanging on a spring; the displacement $y$ from the equilibrium can then be modeled by
$$y'' + y = 0$$
which has oscillatory solutions. More generally, if we have a driving force $f(t)$ applied to the system, we have
$$y'' + y = f(t)$$
Now suppose we want to deliver a sharp impulse to the system to get it going initially; the delta function can be thought of as the limit of very short, but strong impulses acting on the system (and hence, as an instantaneous shock). This can be studied via the "ODE"
$$y'' + y = \delta(t - 1)$$
subject to initial conditions $y(0) = y'(0) = 0$ (so our block is not moving, and not displaced at time $0$; then we hit it $1$ second later). A standard way to study such equations is via the Laplace transform; recall that
$$\mathcal{L}\Big(f(t)\Big)(s) = \int_0^{\infty} e^{-st} f(t) dt$$
Now notice that
$$\mathcal{L}\Big(\delta(t-1)\Big)(s) = \int_0^{\infty} e^{-st} \delta(t - 1) dt = e^{-s}$$
The left-hand side of the equation can be transformed easily, and then standard techniques (e.g. a table of Laplace transforms) can give the final solution quite easily from here.
The key point is that the niceness with which $\delta$ interacts with integrals allows us to extend some techniques to difficult situations.
Best Answer
When $T$ is a distribution and $k$ is a positive number, how should we define the rescaled distribution $T_k$? When $T$ is represented by a function $g$, we just want $g(kx)$. In terms of the integral against a test function $f$, this gives $$ \int_{\mathbb{R}^n} g(kx) f(x) \,dx =k^{-n} \int_{\mathbb{R}^n} g(y) f(x/y) \,dx $$ So, for a general distribution $T$ we define the rescaled distribution $T_k$ as $$ T_k(f) = k^{-n} T(f(x/k)) $$
In the specific case of Dirac delta at $0$, the expression $k^{-n} T(f(x/k))$ evaluates to $ k^{-n}f(0)$. (You work in one dimension, $n=1$.)