I learnt about RN derivative from "Real Analysis" by Folland, and would advise you to check it out there (Chapter 3) as it may answer your coming questions. In particular, Theorem 3.5 answers your Q1. It state that
If $\nu$ is a finite signed measure and $\mu$ is a positive measure, then $\nu\ll \mu$ iff for any $\varepsilon >0 $ there exists $\delta > 0$ such that $\mu(E)<\delta$ implies $|\nu(E)|<\varepsilon $ for any mesaurable $E$.
Now, if $\mu$ is our probability measure and $F$ is the corresponding CDF, then choosing the following $E = \bigcup_{k=1}^n(t_k,t_{k+1}]$ gives us that $\nu\ll \lambda$ implies that $F$ is absolute continuous (as a function). Here $\lambda$ denote the Lebesgue measure.
Regarding Q2: the density is defined relatively to another measure. Whatever measure $Q$ you take, it always has density w.r.t. itself - please, tell me if this fact is not clear to you. Furthermore, indeed if $P = \lambda$ and $H = \delta_0$ then $Q$ does not admit density w.r.t. $P$, however it clearly admits density w.r.t. $Q$ itself.
In probability theory it may be confusing that most of the time we are talking about densities w.r.t. $\lambda$, so that we do not even mention $\lambda$ and say just "density". For that reason you may forget that we are talking about relative density, as there is no "absolute" density at least in measure theory. There density is exactly RN derivative, hence it requires specifying the "denominator" measure.
Q3: I am not sure what exactly you mean here. If $\nu\ll\mu$ we can define KL divergence by
$$
D(\nu,\mu) := \int \log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\nu = \int \frac{\mathrm d\nu}{\mathrm d\mu}\log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\mu \tag{1}
$$
and this is defined purely in terms of measures, so it does not depend on their representation through densities.
Regarding your title question, please check out this and that. I'm expecting you'll reconsider and (or) reformulate your question after reading this answer, unless everything already became clear to you. Just come back and we can proceed. And I encourage you to check Folland's book in general.
Added: let's agree on the following - since there is some confusion regarding the notion of the density, we only use terms "function" and "RN derivative". We can define KL divergence $D(\nu,\mu)$ for measures $\nu\ll\mu$ as in $(1)$. We can also fix some reference measure $\psi$ and define a similar map for functional arguments, that is let
$$
\bar D_\psi(g,f):= \int g \log\left(\frac gf\right)\mathrm d\psi \tag{1'}
$$
for which to be well-defined, we assume that
$$
\{f = 0\} \subseteq \{g = 0\} \tag{2}.
$$
Now, these two notions are relates as follows: $\bar D_\psi(g,f) = D(\bar\nu,\bar\mu)$ where
$$
\bar\nu(\cdot) := \int_{(\cdot)}g\,\mathrm d\psi\qquad \bar\mu(\cdot) := \int_{(\cdot)}f \,\mathrm d\psi
$$
and of course $(2)$ implies that $\nu\ll\mu$. So indeed, to talk about the set $\mathcal G$ of all functions $g$ you need to assume that every function from this set satisfies $(2)$: but if you don't assume that the KL divergence would be infinite for those $g$ (you take integral of $\log$ of infinity) so for sure it is greater than $\epsilon$.
Let me also summarize some relations in one-dimensional case. The basic object is the probability measure $\mu:\mathscr B(\Bbb R) \to [0,1]$. Its CDF is a function on real numbers $F_\mu:\Bbb R\to [0,1]$, which is given by $F_\mu(x):=\mu((-\infty,x])$; hence, to each probability measure there corresponds its unique CDF. Vice-versa, from any function satisfying a couple of properties we can construct a probability measure whose CDF is given by the latter function, see e.g. here. Thus, probability measures on real line and CDFs are in one-to-one correspondence, only the former is a function of sets, whereas the latter is the function of real numbers. If $\mu \ll \lambda$ then its RN derivative $f_\mu := \frac{\mathrm d\mu}{\mathrm d\lambda}:\Bbb R \to \Bbb R_+$ is commonly referred to as a density function of $\mu$, however it would be more formal to say that $f_\mu$ is the density of $\mu$ w.r.t. $\lambda$. Notice that
$$
F_\mu(x) = \int_{-\infty}^x\mathrm \mu(\mathrm dt) = \int_{-\infty}^xf_\mu(t)\, \lambda(\mathrm dt),
$$
hence if $\mu\ll\lambda$, then by LDT we have that $F'_\mu(x)$ exists $\lambda$-a.e. and $F'_\mu(x) = f_\mu(x)$ ($\lambda$-a.e.) For example, if $F_\mu\in C^1(\Bbb R)$ then $F'_\mu$ is a version of the RN derivative $\frac{\mathrm d\mu}{\mathrm d\lambda}$, and by changing $F'_\mu$ on $\lambda$-null sets in any way we can obtain other versions of that RN derivative (since RN derivative is only defined uniquely $\lambda$-a.e.). In fact, in most of the practical cases we compute RN derivatives using usual derivatives; there are not many other methods to compute RN derivatives.
Best Answer
By using Radon-Nikodym, this follows directly, but you can do it using 'standard' techniques (namely following the construction of the integral). We want to prove $\def\P{\mathbb P}$ $$ \tag 1 \int_\Omega g \,d\tilde\P = \int_\Omega fg\,d\P$$ for all $g\in L^1(\tilde\P)$. (1) holds by definition of $\tilde P$ for characteristic functions, as for $g = \chi_A$ we have $$ \int_\Omega g \,d\tilde\P = \tilde P(A) = \int_A f \,d\P = \int_\Omega fg\,d\P $$ By linearity, (1) holds for simple functions, i. e. linear combinations of characteristic functions. If now $g$ is non-negative measurable, choose simple $g_n$ such that $g_n \nearrow g$. Then, we have $fg_n \nearrow fg$, hence $$ \int_\Omega g \,d\tilde\P = \lim\int_\Omega g_n \,d\tilde\P = \lim\int_\Omega fg_n\,d\P ) \int_\Omega fg_n\,d\P $$ For general $g$, now write $g = g^+ - g^-$ and use linearity again.