It is a well-known fact that if $A$ is Hurwitz stable, then $\exp(Ah)$ is Schur stable for all $h>0$.
This can be seen from the definition of the exponential. Assume for simplicity that $A$ is diagonalizable. Then, there exists an invertible matrix $P$ such that $A=PDP^{-1}$ where $D$ is a diagonal matrix that contains the eigenvalues of $A$ on the diagonal.
Then, we have that
$$\exp(Ah)=\sum_{i=0}^\infty\dfrac{A^ih^i}{i!}=\sum_{i=0}^\infty\dfrac{(PDP^{-1})^ih^i}{i!}=P\sum_{i=0}^\infty\dfrac{D^ih^i}{i!}P^{-1}=P\exp(Dh)P^{-1}.$$
We can conclude that the eigenvalues of $\exp(Ah)$ are simply the exponentiated eigenvalues of $A$ scaled with $h>0$. Therefore, if all the eigenvalues of $A$ have negative real part, then all the eigenvalues of $\exp(Ah)$ will lie within the unit disc. The general case can be proven analogously.
For your second question. You should model your sampled-data systems using a hybrid formulation so that you do not need to discretize and handle the difficult integral term that you have. This may seem more complex but it is not, and the hybrid systems formulation actually circumvents a lot of problems.
- We preserve the inter-sample behavior which is lost when using discretization. This is particularly important when performance measures on the continuous-time system need to be considered.
- We preserve the structure of the problem, that is no exponentiation, etc. This means that the approach remains valid when $A$ is uncertain and/or time-varying.
For instance, you can express your system as an impulsive system of the form
$$\begin{array}{rcll}
dz(t)&=&\bar A z(t)dt + Edw(t)&,\ t\ne t_k\\
z(t_k^+)&=&Jz(t_k)&,\ t=t_k
\end{array}$$
where $t_k=kh$, $z=(x,u)$, $\bar{A}=\begin{bmatrix}A & B\\0 & 0\end{bmatrix}$, $E=\begin{bmatrix}C\\0\end{bmatrix}$, and $J=\begin{bmatrix}I & 0\\K & 0\end{bmatrix}$.
If you want to look at the synamics of the moments, then one can see that
$$\begin{array}{rcll}
\dfrac{d}{dt}\mathbb{E}[z(t)]&=&\bar A \mathbb{E}[z(t)]&,\ t\ne t_k\\
\mathbb{E}[z(t_k^+)]&=&J\mathbb{E}[z(t_k)]&,\ t=t_k
\end{array}$$
which is going to be exponentially stable if and only if $$\exp(\bar AT)J=\begin{bmatrix}\exp(Ah) & \Psi(h)\\0 & I \end{bmatrix}\begin{bmatrix}I & 0\\K & 0 \end{bmatrix}=\begin{bmatrix}\exp(Ah)+\Psi(h)K & 0\\K & 0 \end{bmatrix}$$ is Schur stable where $$\Psi(h)=\int_0^h\exp(As)Bds.$$
This will be the case if and only if $\exp(Ah)+\Psi(h)K$ is Schur stable. If we assume that $A+BK$ is Hurwitz, then $\exp(Ah)+\Psi(h)K$ will be Schur stable for any sufficiently small $h>0$. To see this, just observe that a Taylor expansion at $h=0$ of $\exp(Ah)+\Psi(h)K$ yields
$$\exp(Ah)+\Psi(h)K=I+h(A+BK)+o(h)$$ from which the result follows.
For the second-order moments, we have that
$$\begin{array}{rcll}
\dfrac{d}{dt}\mathbb{E}[z(t)z(t)^T]&=&\bar A [z(t)z(t)^T]+[z(t)z(t)^T]\bar A^T+EE^T&,\ t\ne t_k\\
\mathbb{E}[z(t_k^+)z(t_k^+)^T]&=&J\mathbb{E}[z(t_k)z(t_k^+)^T]J^T&,\ t=t_k
\end{array}$$
and the system is stable provided that the operator $L:\mathbb{S}_{\succeq0}^n\mapsto\mathbb{S}_{\succeq0}^n$ defined as
$$L(X)=J\exp(\bar Ah)X\exp(\bar A^Th)J^T$$
has eigenvalues in the unit disc. Assuming that $J\exp(\bar Ah)$ is diagonalizable, it turns out that the eigenvalues of this operator are given by $|\lambda_i|^2$ where the $\lambda_i$'s are the eigenvalues of $J\exp(\bar Ah)$.
Of course, it is possible to to some continuous-time analysis using looped-functionals, Lyapunov functionals, clock-/timer-dependent Lyapunov functions, etc. Those methods are used to design controllers that can guarantee that the closed-loop system will be stable for some range of values for the sampling period or even deal with the aperiodic case where sampling is not periodic. They can also be used to find the largest admissible sampling period or even find the controller that maximizes the largest value of the sampling period.
For resources, you may look at the book by Goebel, Sanfelice and Teel on hybrid systems.
There is also an issue with your formulation is that it will not work when $A$ is unstable while it is possible to stabilize an unstable system with sampled-data control law.
If you require a root locus procedure, I'm sure any Introductory Control textbook will cover this. For instance, pg. 464 (Example 7.11) of Modern Control Systems (13th Ed) by Dorf & Bishop covers a procedure in which the root locus is used to solve almost the exact same problem you propose: using a PI control law against a standard first order system to meet certain response characteristics.
The basic idea is this. The root locus equation with $K$ variable is thus,
$$ 0 = 1 + K L(s) $$
where
$$L(s) = \frac{s + z}{s\,(s + B)},$$
absorbing the gain of the plant into $K.$ If you sketch the root locus for this, you will find that as long as $-z$ is farther to the left of $-B$, the root locus looks something like below.
From this, we can deduce that as long as we place our zero $-z$ somewhere faster than the settling time specification, there exists poles on the root locus that should meet the settling time specification (because they have more negative real part) and meet the overshoot specification (because they converge upon the real axis); we must simply choose a sufficiently large gain $K.$
There is a catch, however. With a PI controller, you introduce a zero in the closed loop transfer function. This zero can add overshoot that scales with the speed of your response. You may need to scale the gain to wash out the effects of the zero in some designs.
In your case, you picked $z = -48$ is more than sufficient to meet your $200~\mathrm{ms}$ settling time spec, but this I think is too aggressive. It still should be possible to find a gain $K$ but it might be difficult to do so, and you will need a larger gain to achieve that end. For that choice of $z$, $K = 10000$ seems to work:
$$C(s) = 10000\,\frac{s + 48}{s}.$$
I would pick $z = -20$ near the settling specification. The root locus thus still has solutions that meet the settling time specification as seen in the diagram, and we just need to choose large enough gains $K$ to meet the overshoot specification while still not having poles that are too fast. Turns out,
$$C(s) = 4000\,\frac{s + 20}{s}$$
does the trick.
Best Answer
Unfortunately, there is no real resource that will give you a proof for that. This is more like a rule of thumb that people empirically found and passed on to the next generation. The idea is to have enough periods before the system settles, but not too much for various reasons such as computational power, noise, etc.
You cite the Nyquist theorem, but this is a signal processing result talking about the possibility of building back an analog signal from its digital counterpart. In control, you do not really care much about that. All you need is to have enough information to be able to control/stabilize the system and have enough bandwidth for the closed-loop system. Moreover, signals in control are not bandlimited, unless you filter them beforehand, of course.
For nonlinear systems, this is even worse, because there is no such thing as rise time as it will depend on the initial conditions and the input you apply to the system. So, even if you assume zero initial conditions, then this will not help. So, this is up to the control engineer to decide what a good sampling period would be. In general, this will be dictated by how fast you want your system to be and what actuators, sensors and controllers/processors you have.