I learnt about RN derivative from "Real Analysis" by Folland, and would advise you to check it out there (Chapter 3) as it may answer your coming questions. In particular, Theorem 3.5 answers your Q1. It state that
If $\nu$ is a finite signed measure and $\mu$ is a positive measure, then $\nu\ll \mu$ iff for any $\varepsilon >0 $ there exists $\delta > 0$ such that $\mu(E)<\delta$ implies $|\nu(E)|<\varepsilon $ for any mesaurable $E$.
Now, if $\mu$ is our probability measure and $F$ is the corresponding CDF, then choosing the following $E = \bigcup_{k=1}^n(t_k,t_{k+1}]$ gives us that $\nu\ll \lambda$ implies that $F$ is absolute continuous (as a function). Here $\lambda$ denote the Lebesgue measure.
Regarding Q2: the density is defined relatively to another measure. Whatever measure $Q$ you take, it always has density w.r.t. itself - please, tell me if this fact is not clear to you. Furthermore, indeed if $P = \lambda$ and $H = \delta_0$ then $Q$ does not admit density w.r.t. $P$, however it clearly admits density w.r.t. $Q$ itself.
In probability theory it may be confusing that most of the time we are talking about densities w.r.t. $\lambda$, so that we do not even mention $\lambda$ and say just "density". For that reason you may forget that we are talking about relative density, as there is no "absolute" density at least in measure theory. There density is exactly RN derivative, hence it requires specifying the "denominator" measure.
Q3: I am not sure what exactly you mean here. If $\nu\ll\mu$ we can define KL divergence by
$$
D(\nu,\mu) := \int \log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\nu = \int \frac{\mathrm d\nu}{\mathrm d\mu}\log\left(\frac{\mathrm d\nu}{\mathrm d\mu}\right)\mathrm d\mu \tag{1}
$$
and this is defined purely in terms of measures, so it does not depend on their representation through densities.
Regarding your title question, please check out this and that. I'm expecting you'll reconsider and (or) reformulate your question after reading this answer, unless everything already became clear to you. Just come back and we can proceed. And I encourage you to check Folland's book in general.
Added: let's agree on the following - since there is some confusion regarding the notion of the density, we only use terms "function" and "RN derivative". We can define KL divergence $D(\nu,\mu)$ for measures $\nu\ll\mu$ as in $(1)$. We can also fix some reference measure $\psi$ and define a similar map for functional arguments, that is let
$$
\bar D_\psi(g,f):= \int g \log\left(\frac gf\right)\mathrm d\psi \tag{1'}
$$
for which to be well-defined, we assume that
$$
\{f = 0\} \subseteq \{g = 0\} \tag{2}.
$$
Now, these two notions are relates as follows: $\bar D_\psi(g,f) = D(\bar\nu,\bar\mu)$ where
$$
\bar\nu(\cdot) := \int_{(\cdot)}g\,\mathrm d\psi\qquad \bar\mu(\cdot) := \int_{(\cdot)}f \,\mathrm d\psi
$$
and of course $(2)$ implies that $\nu\ll\mu$. So indeed, to talk about the set $\mathcal G$ of all functions $g$ you need to assume that every function from this set satisfies $(2)$: but if you don't assume that the KL divergence would be infinite for those $g$ (you take integral of $\log$ of infinity) so for sure it is greater than $\epsilon$.
Let me also summarize some relations in one-dimensional case. The basic object is the probability measure $\mu:\mathscr B(\Bbb R) \to [0,1]$. Its CDF is a function on real numbers $F_\mu:\Bbb R\to [0,1]$, which is given by $F_\mu(x):=\mu((-\infty,x])$; hence, to each probability measure there corresponds its unique CDF. Vice-versa, from any function satisfying a couple of properties we can construct a probability measure whose CDF is given by the latter function, see e.g. here. Thus, probability measures on real line and CDFs are in one-to-one correspondence, only the former is a function of sets, whereas the latter is the function of real numbers. If $\mu \ll \lambda$ then its RN derivative $f_\mu := \frac{\mathrm d\mu}{\mathrm d\lambda}:\Bbb R \to \Bbb R_+$ is commonly referred to as a density function of $\mu$, however it would be more formal to say that $f_\mu$ is the density of $\mu$ w.r.t. $\lambda$. Notice that
$$
F_\mu(x) = \int_{-\infty}^x\mathrm \mu(\mathrm dt) = \int_{-\infty}^xf_\mu(t)\, \lambda(\mathrm dt),
$$
hence if $\mu\ll\lambda$, then by LDT we have that $F'_\mu(x)$ exists $\lambda$-a.e. and $F'_\mu(x) = f_\mu(x)$ ($\lambda$-a.e.) For example, if $F_\mu\in C^1(\Bbb R)$ then $F'_\mu$ is a version of the RN derivative $\frac{\mathrm d\mu}{\mathrm d\lambda}$, and by changing $F'_\mu$ on $\lambda$-null sets in any way we can obtain other versions of that RN derivative (since RN derivative is only defined uniquely $\lambda$-a.e.). In fact, in most of the practical cases we compute RN derivatives using usual derivatives; there are not many other methods to compute RN derivatives.
There is indeed a connection there, but your require a bit more regularity. You actually need $f\in W^{1,\infty}_{loc}(\mathbb{R})$, i.e. $f$ has to be locally Lipschitz. Then by Rademachers Theorem (see e.g. https://en.wikipedia.org/wiki/Rademacher%27s_theorem) $f$ has a classical derivative almost everwhere.
This and monotonicity of $f$ is enough to show, that $LS_f$ is a Radon measure and absolutely continuous w.r.t. Lebesgue measure $\mathcal{L}^1$. Since we are working in $\mathbb{R}$ we can apply the differentiation theorem for Radon measures, see e.g. Leon Simons book about geometric measure theory. It states, that the Radon-Nikodym derivative can be calculated as follows:
$$\frac{ d(LS_f)}{d\mathcal{L}^1}(x) = \lim_{r\rightarrow 0} \frac{LS_f(B_r(x))}{\mathcal{L}^1(B_r(x))} = \lim_{r\rightarrow 0}\frac{f(x+r)-f(x-r)}{2r} = f'(x)\ \mathcal{L}^1\mbox{-a.e.}$$
The last equality is by Rademachers Theorem.
EDIT: I did some research and I do not think the added regularity is really needed. For this you have to show, that $f\in W^{1,1}$ is enough for $f$ to be absolutely continuous, see e.g. https://www.math.ucdavis.edu/~hunter/m218a_09/ch3A.pdf Theorem 3.57
This is indeed true, because $f'\in L^1$ and we therefore have
$$\forall \varepsilon > 0\ \exists \delta>0:\ \mbox{for all measurable }A\subseteq\mathbb{R}\mbox{ with }\mathcal{L}^1(A)<\delta$$
we have $\int_A|f'|\, dx < \varepsilon$. This is true by e.g. Vitalis convergence theorem.
So let us check absolute continuity for $f$: We have to check the following: For all $\varepsilon>0$ exists a $\delta>0$ such that
for all $-\infty < a_1 < b_1 < a_2 < \ldots< a_n< b_n< \infty$ with
$$\sum_{k=1}^n b_k-a_k < \delta,\mbox{ we have }\sum_{k=1}^nf(b_k) -f(a_k) < \varepsilon.$$
Hence we choose $\delta$ as above for the integral estimate. Then
$$\mathcal{L}^1(\bigcup_{k=1}^n [a_k,b_k]) = \sum_{k=1}^n b_k-a_k < \delta$$
and therefore
$$\sum_{k=1}^n f(b_k)-f(a_k) = \sum_{k=1}^n\int_{a_k}^{b_k}f'(x)\, dx \leq \int_{\bigcup_{k=1}^n[a_k,b_k]}|f'(x)|\, dx < \varepsilon.$$
Best Answer
It is called a derivative as a tribute to the (Riemann) fundamental theorem of calculus. It is not a classic derivative, since $\nu$ is most likely not a function of $\mu$.
The Radon-Nikodym theorem just states that whenever some (possibly signed) measure $\nu$ is absolutely continuous with respect to a measure $\mu$, over the same sigma algebra in a sigma finite space, there exists a measurable function $f$ such that:
$$\nu(A)=\int_Af\,\mathrm{d}\mu$$
If we treated this like a more familiar integral we might be tempted to differentiate both sides “with respect to $\mu$” because we are used to derivatives being the opposite of integrals, and we would “get” $d\nu=f\,d\mu$ as a convenient heuristic notation.
There do exist other types of derivative definition for more abstract cases (e.g. Fréchet’s derivative) but Radon Nikodym is not that. However, saying $f$ is the R-N derivative of $\nu$ with respect to $\mu$ does tell us “the density” of $\nu$, or how $\nu$ measure changes with $\mu$ measure, so intuitively it is a good notation/terminology.
Just remember that the following:
$$\lim_{h\to0}\frac{\nu(A+h)-\nu(A)}{\mu(h)}$$
Would perhaps be a way to express your classical derivative, and here that expression does not make any sense unless we make many more definitions. It is the same idea as derivative, but formally different to, say, $d/dx\,x^2=2x$.
You may be interested in the Lebesgue differentiation theorem. It is again not talking about a classical derivative, but is a much closer analogue.