The two constructions are equivalent and their equivalence is based on the so-called thinning of Poisson processes.
Klenke starts from a homogenous Poisson process with a large rate $\lambda$. Amongst the times of this process, when at $x$, a relative proportion $1+q(x,x)/\lambda$ is used to jump from $x$ to $x$ and, for every $y\ne x$, a relative proportion $q(x,y)/\lambda$ is used to jump from $x$ to $y$. The jumps $x\to x$ have no effect, hence one is left with a proportion $q(x,y)/\lambda$ of jumps $x\to y$ amongst a global population of potential jump times with density $\lambda$, that is, the correct rate $q(x,y)$.
The only condition for this construction to work is $1+q(x,x)/\lambda\geqslant0$ for every $x$, that is, $\lambda\geqslant\sup\limits_x[-q(x,x)]$, hence one can choose, as many authors do, $\lambda=\sup\limits_x[-q(x,x)]$ but any larger value of $\lambda$ will do as well.
Norris's construction might be more usual hence I will not comment on it here, except to note that $\lambda$, the initial distribution in Norris, is related in no way whatsoever to $\lambda$, the positive real number in Klenke. (My impression is that Klenke's version, more elegant, is slowly replacing the other one in the probabilists' minds.)
Edit The piece of Norris's construction missing from your account is that $\Pi$ is related to $Q$ through $\Pi(x,x)=0$ for every $x$, and, for every $y\ne x$,
$$
\Pi(x,y)=\frac{q(x,y)}{q(x)}\quad\text{with}\quad q(x)=-q(x,x)=\sum_{z\ne x}q(x,z).
$$
Let $(X_t)_{t\geq 0}$ be the process given by the second construction, driven by a Poisson process $(N_t)_{t\geq 0}$ of rate $\lambda$ and a DTMC $(Y_n)_{n\in\mathbb{N}}$ with transition matrix $Q$.
Fix $h>0$. Let's start $X$ in state $i$ (i.e. $X_0=Y_0=i$) and consider the probability of it being at state $j$ (with $j\neq i$) at time $h$.
$$\begin{align}\mathbb{P}[X_h=j] &= \sum_{n\in\mathbb{N}} \mathbb{P}[X_h=j, N_h=n]\\
&= \mathbb{P}[X_h=j, N_h=1] + o(h)\\
&= \mathbb{P}[Y_1=j]\mathbb{P}[N_h=1] + o(h)\\
&= \frac{g_{ij}}{\lambda}(\lambda h)+ o(h)\\
&= g_{ij}h+o(h)
\end{align}
$$
This is one characterization of a CTMC with generator $G$.
Intuitive remarks: Since $\lambda$ cancels, its precise value doesn't matter. In the second construction more "jumps" occur, because of the condition on $\lambda$, but the DTMC is able to transition to its current state, and these jumps aren't observed in the resultant CTMC.
Best Answer
Presumably because not all jump processes are not Markov jump processes.
Also jump processes do not have discrete space. Take a compound Poisson process, for example, that is a process for which jumps happen at a fixed rate $\lambda$, but the jump distribution is not a constant 1, but instead can be a distribution (which may be continuous), therefore the space is not discrete.
Also there are also jump processes, which has independent increment property (and markov property) but are not compound Poisson process. These processes, with Brownian motions are the only processes with independent increment properties. they are called Levy processes.
Notice jump continuous-time discrete space and compound Poisson process has finite variation, but this is not necessarily the case with general Levy processes (even without the Brownian motion component), which are pure jump processes.
Of course, this is only a small class of Markovian jump processes. I am sure there are plenty of others which I have not mentioned here. Also there are markov processes for which the transition probability is time dependent, but these won't have the Q matrices as generator which you might have had in mind.