The two constructions are equivalent and their equivalence is based on the so-called thinning of Poisson processes.
Klenke starts from a homogenous Poisson process with a large rate $\lambda$. Amongst the times of this process, when at $x$, a relative proportion $1+q(x,x)/\lambda$ is used to jump from $x$ to $x$ and, for every $y\ne x$, a relative proportion $q(x,y)/\lambda$ is used to jump from $x$ to $y$. The jumps $x\to x$ have no effect, hence one is left with a proportion $q(x,y)/\lambda$ of jumps $x\to y$ amongst a global population of potential jump times with density $\lambda$, that is, the correct rate $q(x,y)$.
The only condition for this construction to work is $1+q(x,x)/\lambda\geqslant0$ for every $x$, that is, $\lambda\geqslant\sup\limits_x[-q(x,x)]$, hence one can choose, as many authors do, $\lambda=\sup\limits_x[-q(x,x)]$ but any larger value of $\lambda$ will do as well.
Norris's construction might be more usual hence I will not comment on it here, except to note that $\lambda$, the initial distribution in Norris, is related in no way whatsoever to $\lambda$, the positive real number in Klenke. (My impression is that Klenke's version, more elegant, is slowly replacing the other one in the probabilists' minds.)
Edit The piece of Norris's construction missing from your account is that $\Pi$ is related to $Q$ through $\Pi(x,x)=0$ for every $x$, and, for every $y\ne x$,
$$
\Pi(x,y)=\frac{q(x,y)}{q(x)}\quad\text{with}\quad q(x)=-q(x,x)=\sum_{z\ne x}q(x,z).
$$
There are many questions here, and some of them should perhaps be split into their own posts. But to address the title question, consider the Ornstein–Uhlenbeck process, which can be defined by the SDE $dX_t = dB_t - X_t \,dt$, or in various other equivalent ways. (In Wikipedia's notation, I am using the parameters $\mu = 0$ and $\theta = \sigma = 1$.) You can picture it as a particle which is trying to perform Brownian motion, but is connected to a spring which pulls it back toward the origin.
It is a continuous Gaussian process which is strong Markov, but its increments are not independent; they are negatively correlated. (In particular, if you "proved" that a Gaussian process is Markov iff it has independent increments, then you made an error.)
Best Answer
Discrete-time continuous state Markov processes are widely used. Autoregressive processes are a very important example.
Actually, if you relax the Markov property and look at discrete-time continuous state stochastic processes in general, then this is the topic of study of a huge part of Time series analysis and signal processing.
The most famous examples are ARMA processes, the Conditionally Heteroscedastic models, a large subclass of Hidden Markov models....