I would say that conditioning and independence is something that disctinct, expectation is used a lot in the measure theory as well, by the name of the Lebesgue integral.
The point is that the probability as a science before was maybe even more closer to physics than to math by being based on experiments. It became the classical Probability Theory (PT) when it was axiomatized the first half of XX century by the means of Measure Theory (MT). So MT is clearly a mathematical basis for the classical PT and in that sense you can consider PT to be a subdiscipline of MT.
There are two moments to mention, though.
There is an algebraic approach to probability which starts with algebras of random variables and defines a linear functional on such algebras - which is an expectation. Shall we say that the Probability Theory is a subdiscipline of Abstract Algebra?
In both cases - you start with something empirical: probability, random variables etc. You wish them to satisfy some kind of properties and by this you bring a particular structure: either a measure-theoretical, or an algebraic. However, there is an additional meaning of the results that you obtain. For example, the Law of Large Numbers and Central Limit Theorem are obtained by using pure measure-theoretical methods. But these results are very important exactly for Probability Theory. The interpretation of MT via probabilistic ideas provides you additional intuition about "how should it be" and help to understand "what does it mean".
That is completely an opinion which I've chosen for myself. Hope that it helps.
I learnt about RN derivative from "Real Analysis" by Folland, and would advise you to check it out there (Chapter 3) as it may answer your coming questions. In particular, Theorem 3.5 answers your Q1. It state that
If $\nu$ is a finite signed measure and $\mu$ is a positive measure, then $\nu\ll \mu$ iff for any $\varepsilon >0 $ there exists $\delta > 0$ such that $\mu(E)<\delta$ implies $|\nu(E)|<\varepsilon $ for any mesaurable $E$.
Now, if $\mu$ is our probability measure and $F$ is the corresponding CDF, then choosing the following $E = \bigcup_{k=1}^n(t_k,t_{k+1}]$ gives us that $\nu\ll \lambda$ implies that $F$ is absolute continuous (as a function). Here $\lambda$ denote the Lebesgue measure.
Regarding Q2: the density is defined relatively to another measure. Whatever measure $Q$ you take, it always has density w.r.t. itself - please, tell me if this fact is not clear to you. Furthermore, indeed if $P = \lambda$ and $H = \delta_0$ then $Q$ does not admit density w.r.t. $P$, however it clearly admits density w.r.t. $Q$ itself.
In probability theory it may be confusing that most of the time we are talking about densities w.r.t. $\lambda$, so that we do not even mention $\lambda$ and say just "density". For that reason you may forget that we are talking about relative density, as there is no "absolute" density at least in measure theory. There density is exactly RN derivative, hence it requires specifying the "denominator" measure.
Q3: I am not sure what exactly you mean here. If $\nu\ll\mu$ we can define KL divergence by
$$
D(\nu,\mu) := \int \log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\nu = \int \frac{\mathrm d\nu}{\mathrm d\mu}\log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\mu \tag{1}
$$
and this is defined purely in terms of measures, so it does not depend on their representation through densities.
Regarding your title question, please check out this and that. I'm expecting you'll reconsider and (or) reformulate your question after reading this answer, unless everything already became clear to you. Just come back and we can proceed. And I encourage you to check Folland's book in general.
Added: let's agree on the following - since there is some confusion regarding the notion of the density, we only use terms "function" and "RN derivative". We can define KL divergence $D(\nu,\mu)$ for measures $\nu\ll\mu$ as in $(1)$. We can also fix some reference measure $\psi$ and define a similar map for functional arguments, that is let
$$
\bar D_\psi(g,f):= \int g \log\left(\frac gf\right)\mathrm d\psi \tag{1'}
$$
for which to be well-defined, we assume that
$$
\{f = 0\} \subseteq \{g = 0\} \tag{2}.
$$
Now, these two notions are relates as follows: $\bar D_\psi(g,f) = D(\bar\nu,\bar\mu)$ where
$$
\bar\nu(\cdot) := \int_{(\cdot)}g\,\mathrm d\psi\qquad \bar\mu(\cdot) := \int_{(\cdot)}f \,\mathrm d\psi
$$
and of course $(2)$ implies that $\nu\ll\mu$. So indeed, to talk about the set $\mathcal G$ of all functions $g$ you need to assume that every function from this set satisfies $(2)$: but if you don't assume that the KL divergence would be infinite for those $g$ (you take integral of $\log$ of infinity) so for sure it is greater than $\epsilon$.
Let me also summarize some relations in one-dimensional case. The basic object is the probability measure $\mu:\mathscr B(\Bbb R) \to [0,1]$. Its CDF is a function on real numbers $F_\mu:\Bbb R\to [0,1]$, which is given by $F_\mu(x):=\mu((-\infty,x])$; hence, to each probability measure there corresponds its unique CDF. Vice-versa, from any function satisfying a couple of properties we can construct a probability measure whose CDF is given by the latter function, see e.g. here. Thus, probability measures on real line and CDFs are in one-to-one correspondence, only the former is a function of sets, whereas the latter is the function of real numbers. If $\mu \ll \lambda$ then its RN derivative $f_\mu := \frac{\mathrm d\mu}{\mathrm d\lambda}:\Bbb R \to \Bbb R_+$ is commonly referred to as a density function of $\mu$, however it would be more formal to say that $f_\mu$ is the density of $\mu$ w.r.t. $\lambda$. Notice that
$$
F_\mu(x) = \int_{-\infty}^x\mathrm \mu(\mathrm dt) = \int_{-\infty}^xf_\mu(t)\, \lambda(\mathrm dt),
$$
hence if $\mu\ll\lambda$, then by LDT we have that $F'_\mu(x)$ exists $\lambda$-a.e. and $F'_\mu(x) = f_\mu(x)$ ($\lambda$-a.e.) For example, if $F_\mu\in C^1(\Bbb R)$ then $F'_\mu$ is a version of the RN derivative $\frac{\mathrm d\mu}{\mathrm d\lambda}$, and by changing $F'_\mu$ on $\lambda$-null sets in any way we can obtain other versions of that RN derivative (since RN derivative is only defined uniquely $\lambda$-a.e.). In fact, in most of the practical cases we compute RN derivatives using usual derivatives; there are not many other methods to compute RN derivatives.
Best Answer
The case you describe is the general case since every measures $\mu$ and $\nu$ are absolutely continuous with respect to $\mu+\nu$. More precisely, there exists $h_{\mu,\nu}$ with $0\leqslant h_{\mu,\nu}\leqslant1$ everywhere such that $\mu=h_{\mu,\nu}(\mu+\nu)$ and $\nu=(1-h_{\mu,\nu})(\mu+\nu)$. Thus one can define an intrinsic product $\mu\odot\nu$ by $$ \mu\odot\nu=h_{\mu,\nu}(1-h_{\mu,\nu})(\mu+\nu). $$ When $\mu$ and $\nu$ are absolutely continuous with respect to the Lebesgue measure (or any other measure of reference) with densities $f$ and $g$ respectively, then $\mu\odot\nu$ is absolutely continuous with respect to the Lebesgue measure with density $f\odot g$ defined as follows: on $[f+g=0]$, $f\odot g=0$, and, on $[f+g\ne0]$, $$ f\odot g=\frac{fg}{f+g}. $$ This product $\odot$ on measures is commutative (good), associative (good?), the total mass of $\mu\odot\nu$ is at most $\frac14$ times the sum of the masses of $\mu$ and $\nu$, in particular the product of two probability measures is not a probability measure (not good?), $\mu\odot\mu=\frac12\mu$ for every $\mu$, and finally $\mu\odot\nu=0$ if and only $\mu$ and $\nu$ are mutually singular (good?) since $\mu\odot\nu$ is always absolutely continuous with respect to both $\mu$ and $\nu$.
Edit To normalize things, another idea is to consider $\mu\Diamond\nu=2(\mu\odot\nu)$. In terms of densities, this corresponds to a harmonic mean, since $\mu\Diamond\nu$ has density $f\Diamond g$, where $$ \frac1{f\Diamond g}=\frac1{2(f\odot g)}=\frac12\left(\frac1f+\frac1g\right). $$ In particular, this new intrinsic product $\Diamond$ is idempotent (good?), commutative (good), and not associative (not good?).
Edit A canonical product concerns probability measures and transition kernels. That is, one is given a measured space $(X,\mathcal X,\mu)$, a measurable space $(Y,\mathcal Y)$ and a function $\pi:X\times\mathcal Y\to[0,1]$ such that, for every $x$ in $X$, $\pi(x,\ )$ is a probability measure on $(Y,\mathcal Y)$. Then, under some regularity conditions, the product $\mu\times\pi$ is the unique measure on $(X\times Y,\mathcal X\otimes\mathcal Y)$ such that, for every $A$ in $\mathcal X$ and $B$ in $\mathcal Y$, $$ (\mu\times \pi)(A\times B)=\int_A\mu(\mathrm dx)\pi(x,B). $$ In particular, $B\mapsto(\mu\times\pi)(X\times B)$ is a probability measure on $(Y,\mathcal Y)$.
When $\mu$ has density $f$ with respect to a measure $\xi$ and each $\pi(x,\ )$ has density $g(x,\ )$ with respect to a measure $\eta$, $\mu\times\pi$ has density $(x,y)\mapsto f(x)g(x,y)$ with respect to the product measure $\xi\otimes\eta$.