The dirac delta is what is called a generalised function.
Basically, the dirac delta is defined to work as a useful exception to the usual rule of a discrete atom having zero Lebesgue measure.
As an exception, naturally it doesn't follow the rule. So, if it is useful to do so then we might extend the notion of expectation to say:
$$\mathsf E(\delta(X-c)) = f(c)$$
The idea is that the "dirac delta function" at $0,$ denoted by $\delta_0,$ is not an ordinary function. But somehow we can still integrate with it. For every continuous $g$ we have
$$\int_{\mathbb R} g\,\delta_0 = g(0).$$
However, no integrable function could have this property. That is, if $f$ is integrable on $[-a,a],$ then
$$\int_{-a}^a g(x) f(x)\,dx = g(0)$$
will fail for some continuous $g.$
On the other hand, and staying with elementary means, we can find a sequence $f_n$ of continuous functions on $\mathbb R$ such that
$$\lim_{n\to \infty}\int_{\mathbb R} g(x) f_n(x)\,dx = g(0)$$
for every continuous $g$ on $\mathbb R .$
Proof: Define $f_n$ on $[-1/n,1/n]$ to have the triangular graph that joins the points $(-1/n,0), (0,n), (1/n,0);$ define $f_n=0$ everywhere else. You can see that the $f_n$'s live in ever smaller intervals centered at $0,$ but nevertheless $\int_{\mathbb R} f_n = 1$ for every $n.$
Let $g$ be continuous. Then for each $n$ there exists $c_n\in [-1/n,1/n]$ such that $|g(c_n)-g(0)|$ is the maximum of $|g(x)-g(0)|$ on the interval $[-1/n,1/n].$ Thus
$$|\int_{\mathbb R} g(x) f_n(x)\,dx - g(0)| = |\int_{\mathbb R} [g(x)-g(0)] f_n(x)\,dx|$$ $$ \le |g(c_n)-g(0)| \int_{\mathbb R} f_n = |g(c_n)-g(0)|\cdot 1.$$
As $n\to \infty,$ $c_n\to 0,$ so the last expression $\to 0$ by the continuity of $g$ at $0.$
Best Answer
I think that part of your confusion comes from the fact that the action of the $\delta$-distribution (functional) on a test function is written by the use of the integral sign. In fact, you should think of this integral $$\int\limits_{-\infty}^{+\infty}{dx f(x)\delta (x-a)}$$ just as a symbol (a notation). It is not a real integral and the integrants are not real functions ($\delta(x-a)$). To avoid confusion, it should be written with the usual notation for duality product: $\langle \delta(x-a),f\rangle=\langle \delta_a,f\rangle =f(a)$.
Many distributions can be indeed represented by an integral and a Lebesgue measurable function: If $\phi$ is a distribution and $g\in L_{loc}^1(\mathbb R)$, for which $\langle \phi, f\rangle=\int\limits_{-\infty}^{+\infty}{g(x)f(x)dx}$, for all test functions $f$ then the distribution $\phi$ is identified with the locally integrable function $g$ (in fact every $g\in L_{loc}^1(\mathbb R)$ defines a distribution by the above integral) and often it is used the same letter for the distribution and the function, i.e often it will be written like $\langle \phi, f\rangle=\int\limits_{-\infty}^{+\infty}{\phi(x)f(x)dx}$. Such distributions, which can be represented by the above integral are called regular. It is well known that the $\delta$ distribution is not regular, i.e it can not be expressed in terms of integral with the use of some locally integrable function $g$. But still many authors prefer to use the integral representation like for the regular distributions in order to keep an unified notation and way of exposition. Therefore, the integral $\int\limits_{-\infty}^{+\infty}{dx f(x)\delta (x-a)}$ is just a symbol, and is understood as if there was really a locally integrable function $\delta(x-a)$ for which $\langle \delta(x-a),f\rangle= \int\limits_{-\infty}^{+\infty}{dx f(x)\delta (x-a)}=f(a)$ (where we again use the same letter for the distribution and the "$L_{loc}^1$-function").
A good reference, that I recommend you to check is "Green functions and boundary value problems" (3-rd edition) by Ivar Stakgold and Michael Holst.