The two constructions are equivalent and their equivalence is based on the so-called thinning of Poisson processes.
Klenke starts from a homogenous Poisson process with a large rate $\lambda$. Amongst the times of this process, when at $x$, a relative proportion $1+q(x,x)/\lambda$ is used to jump from $x$ to $x$ and, for every $y\ne x$, a relative proportion $q(x,y)/\lambda$ is used to jump from $x$ to $y$. The jumps $x\to x$ have no effect, hence one is left with a proportion $q(x,y)/\lambda$ of jumps $x\to y$ amongst a global population of potential jump times with density $\lambda$, that is, the correct rate $q(x,y)$.
The only condition for this construction to work is $1+q(x,x)/\lambda\geqslant0$ for every $x$, that is, $\lambda\geqslant\sup\limits_x[-q(x,x)]$, hence one can choose, as many authors do, $\lambda=\sup\limits_x[-q(x,x)]$ but any larger value of $\lambda$ will do as well.
Norris's construction might be more usual hence I will not comment on it here, except to note that $\lambda$, the initial distribution in Norris, is related in no way whatsoever to $\lambda$, the positive real number in Klenke. (My impression is that Klenke's version, more elegant, is slowly replacing the other one in the probabilists' minds.)
Edit The piece of Norris's construction missing from your account is that $\Pi$ is related to $Q$ through $\Pi(x,x)=0$ for every $x$, and, for every $y\ne x$,
$$
\Pi(x,y)=\frac{q(x,y)}{q(x)}\quad\text{with}\quad q(x)=-q(x,x)=\sum_{z\ne x}q(x,z).
$$
First we can do some reasoning about why we use $\pi Q = 0$ to get the stationary distribution. According to the definition of stationary distribution, the equation we are to solve is $\pi P(t)=\pi$, where $P(t)$ is the transition matrix of the process.
One thing to note is that the $\mathbf{P}$ you mentioned is not the same as $P(t)$, where the latter is a function of $t$. $\mathbf{P}$ you mentioned is the jump matrix. It is kind of like what you said, an underlying discrete-time transition matrix.
Let's get back to the equation $\pi P(t)=\pi$. Obviously it is not easy to solve. Hence we can take derivative with respect to $t$, and the LHS becomes $\pi Q$, and the RHS becomes $0$.
To answer your second question. The question asked explicitly for $\mathbb{P}(X(t)=1)$ as $t\rightarrow \infty$. Hence the distribution we want is actually the limiting distribution. Under the condition that $X$ is irreducible with a standard semigroup $\{\mathbb{P}(t),t\geq 0\}$ of transition probabilities, we can say that the stationary distribution is also the limiting distribution.
Also, for this question the full balance equation $\pi Q = 0$ is also the detailed balance equation, indicating the process is reversible.
Best Answer
Since the process $X_s$ for $s\ge t$ only depends on the state at $X_t$, not before time $t$, the state at discrete times $X_{kt}$ for integers $k\ge n$ only depends on the state $X_{nt}$ and not before, which is the definition of a discrete time Markov process.
If $A_{ij}$ is the transition intensity $i\rightarrow j$, define the vector $a$ by $a_i=\sum_j A_{ij}$ and the matrix $Q_{ij}=A-D_{a}=a_{ij}-a_i\delta_{ij}$.
Now, the transition probability $P_{tij}$ of $i\rightarrow j$ over a time period $t$ is $$ P_t = e^{tQ} = \lim_{n\rightarrow\infty}\left(I+\frac{tQ}{n}\right)^n = I + tQ + \frac{t^2Q^2}{2!} + \cdots. $$