It is possible to get a general formula for stationary ARMA(p,q) autocovariance function. Suppose $X_t$ is a (zero mean) stationary solution of an ARMA(p,q) equation:
$$\phi(B)X_t=\theta(B)Z_t$$
Multiply this equation by $X_{t-h}$, $h>q$, take expectations and you will get
$$r(h)-\phi_1r(h-1)-...-\phi_pr(h-p)=0$$
This is a recursive equation, which has a general solution. If all the roots $\lambda_i$ of polynomial $\phi(z)=1-\phi_1z-...-\phi_pz^p$ are different,
$$r(h)=\sum_{i=1}^pC_i\lambda_i^{-h}$$
where $C_i$ are constants which can be derived from the initial conditions. Since $|\lambda_i|>1$ to ensure stationarity it is very clear why the autocorrelation function (which is autocovariance function scaled by a constant) is decaying rapidly (if $\lambda_i$ are not close to one).
I've covered the case of unique real roots of the polynomial $\phi(z)$, all other cases are covered in general theory, but formulas are a bit messier. Nevertheless the terms $\lambda^{-h}$ remain.
Answers to question 2 and 3 more or less follow from this formula. For $AR(1)$ process $r(h)=c\phi_1^h$ and when $\phi_1$ is close to one, i.e. close to non-stationarity, you get the behaviour you describle. The same goes for general formula, if the process is nearly unit-root one of the roots $\lambda_i$ is close to 1 and it dominates other terms, producing the slow decay.
There are several different flavours of stationarity. The type described in your definition is weak-sense stationarity, also known as wide-sense stationarity, covariance stationarity, or second-order stationarity.
Your definition is not quite complete: a preliminary condition for weak stationarity is that the mean and covariance must exist and be finite, but this is satisfied here. As you noted, weak stationarity further requires the mean to be constant over time and $\operatorname{Cov}(X_{t+h},X_t)$ to be independent of $t$ for each $h$ i.e. the autocovariance at each lag $h$ is constant over time. The fact that $\mathbb{E}(t)=a+bt$ shows the first of these conditions is not met. Even the mean is not stationary.
Another form of stationarity is strong stationarity, also called strict stationarity or just stationarity. This requires the joint distribution function of the joint distribution of $X_t$ taken at any $k$ times $t_1, t_2, \dots , t_k$ is the same when lagged by any $\tau$. Technically, for any $k$ and any $\tau$, and for any $t_1, t_2, \dots, t_k$ we require
$$F_X(x_{t_1+\tau}, \dots, x_{t_k+\tau}) = F_X(x_{t_1}, \dots, x_{t_k})$$
This does not just imply that the mean and covariance at any given lag (if either exists) must stay constant over time, but that every conceivable property one can derive from the distribution is invariant under a time shift, in which sense its conditions are "stricter" (though note that if mean or covariance are not finite, we can have a strongly stationary series which does not fulfil the preliminary condition for weak stationarity). Since your mean is not constant over time, then your process can't be strongly stationary either. In general, so long as the mean and covariance exist, we can conclude that a process that is not weakly stationary will not be strongly stationary either. The converse is not true: just because a process is not strongly stationary, doesn't mean it can't be weakly stationary, since it is possible to be weakly stationary yet not strongly stationary.
There is a sense in which your $X_t$ is "stationary": it is trend stationary.
This means that the trend in your time series can be expressed as a function of $t$; if we strip this trend out then what we are left with is a stationary process. In particular, $X_t$ is trend stationary if we can express it as
$$X_t = f(t) + Y_t$$
where $f$ is a deterministic function of time $t$ and $\{Y_t\}$ is a stationary process. In your case we can take $f(t)=a + bt$ and $Y_t = Z_t$; the idea is that by "stripping out" the trend $f(t)$ we would have $X_t - (a + bt) = Z_t$ which is stationary (because it's white noise).
Alternatively we could have taken $f(t)=bt$ and $Y_t= a + Z_t$.
Note that although in this case we could take a process that didn't have a constant mean, then can subtract a deterministic trend to obtain a stationary process, doesn't mean we can do this on any such process. Consider a random walk with drift, for instance. Here the mean is not constant over time, but we can't simply subtract a deterministic trend to obtain a stationary process. So a random walk with drift would not be a trend stationary process. See the question "Difference between series with drift and series with trend" for an illustration.
Best Answer
There can be some confusion of terms here depending on whether the adjective seond-order is considered to be modifying stationary or random process (or both!). To some people,
A second-order random process $\{X_t \colon t \in \mathbb T\}$ is one for which $E[X_t^2]$ is finite (indeed bounded) for all $t \in \mathbb T$. For us electrical engineers who apply (or mis-apply!) random process models in studying electrical signals, $E[X_t^2]$ is a measure of the average power delivered at time $t$ by a stochastic signal, and so all physically observable signals are modeled as second-order processes. Note that stationarity has not been mentioned at all and these second-order processes might or might not be stationary.
A random process that is stationary to order $2$, which we can (but perhaps should not) call a second-order stationary random process provided we agree that second-order modifies stationary and not random process, is one for which $\mathbb T$ is a set of real numbers that is closed under addition, and the joint distribution of the random variables $X_t$ and $X_{t+\tau}$ (where $t, \tau \in \mathbb T)$ depends on $\tau$ but not on $t$. As the link provided by AO shows, a random process stationary to order $2$ need not be strictly stationary. Nor is such a process necessarily wide-sense-stationary because there is no guarantee that $E[X_t^2]$ is finite: consider for example a strictly stationary process in which the the $X_t$'s are independent Cauchy random variables.
A second-order random process (meaning finite power as in the first item above) that is stationary to at least order $2$ is wide-sense-stationary.
OK, so that is the perspective from a different set of users of random process theory. For more details, see, for example, this answer of mine on dsp.SE.