The book you are reading is very likely referring to weak stationarity, as opposed to strict(strong) stationarity.
A time series is strictly stationary if all statistical properties remain the same under time shift. In practice, strict stationarity is too limiting of an assumption and rarely holds true. A lot of the information about the joint distribution are provided in the mean and variance. Thus we often only require weak stationarity, which implies that the mean and covariance function is independent of t (for each lag h), provided that the second moment is finite. (Note that given that second moment is finite, strict stationarity implies weak stationarity.)
Deterministic Trend
$$
y_t = \beta_0 + \beta_1 t + \epsilon_t
$$
where $\{\epsilon_t\}$ is white noise, for simplicity. Same discussion applies to the case where $\{\epsilon_t\}$ is a covariance-stationary process (e.g. ARIMA with $d = 0$).
The process is random fluctuations around a deterministic linear trend $\beta_0 + \beta_1 t$. Hence the terminology "deterministic trend".
Such processes also called trend-stationary. If you remove the linear trend, you recover the stationary process $\{\epsilon_t\}$.
Stochastic Trend
$$
y_t = \beta_0 + \beta_1 t + \eta_t
$$
where $\{\eta_t\}$ is a random walk, for simplicity. Same discussion applies to the case where $\{\eta_t\}$ is an $I(1)$ process (e.g. ARIMA with $d = 1$).
Equivalently,
$$
y_t = y_0 + \beta_0 + \beta_1 t + \sum_{s = 1}^{t} \epsilon_t
$$
where $\{\epsilon_t\}$ is the white noise driving the random walk $\{\eta_t\}$.
The "stochastic trend" terminology refers to $\eta_t$. The random walk is a highly persistent process, giving its sample path the appearance of a "trend".
Such processes are also called difference-stationary. If you take first-difference, you recover the stationary process $\{\epsilon_t\}$, i.e.
$$
\Delta y_t = \beta_1 + \epsilon_t,
$$
which is the same series (random walk with drift) from your second link.
Visual Similarity
You can observe via simulation that the sample paths from these two models can be visually similar---e.g. choose $\beta_1=1$ and $\epsilon_t \stackrel{i.i.d.}{\sim}(0,1)$.
This is because the linear trend $\beta_0 + \beta_1 t$ dominates. More precisely, for both models
$$
\frac{y_t}{t} = \beta_1 + o_p(1).
$$
Only the slope term $\beta_1$ is not negligible in the limit. For the deterministic trend case, it is clear that $\frac{\epsilon_t}{t} = o_p(1)$.
For the stochastic trend case, $\frac{\eta_t}{t} = o_p(1)$ because $\frac{\eta_t}{\sqrt{t}}$ converges in distribution to a normal distribution (Central Limit Theorem).
Statistical Testing
The visual similarity of sample paths motivates the problem of statistically distinguishing these two models. This is the purpose of unit root tests---e.g. the (Augmented) Dickey-Fuller test, which is historically the first such test.
For the ADF test, you basically take the detrended series $\tilde{y}_t$ (residuals from regressing $y_t$ on $1$ and $t$), run the regression
$$
\Delta \tilde{y}_t = \alpha \tilde{y}_{t-1} + \tilde{\epsilon}_t,
$$
and consider the $t$-statistic for $\alpha = 0$. It the $t$-statistic is small, you reject the null of stochastic trend.
The empirical reasoning behind the ADF test is simple. Even though the sample paths themselves are similar, the detrended series would look quite different. Under trend-stationarity, the detrended series would appear stationary. On the other hand, if a difference-stationary model is mistakenly detrended, the detrended series would not appear stationary.
Best Answer
This is most easily understood if you have some experience with probability theory. Mathematically speaking, a random variable is a measurable function from some background probability space into another space. It is just a function. Say, we're measuring the height of the next person passing by my window. We could define a stochastic variable from the background space into the real numbers and use it to describe this height. Now we just have a random variable. When somebody actually passes by, we would have a real number. An realization of that random variable. Let's say that realization is 1.9 m. It could have been infinitely many other numbers.
Your situation is the same. You have a background probability space and a function from that space into some other space. This time we call the function a stochastic process (because it is a function into some space indexed by time, for instance), and the realization of the stochastic process is now called an observed time series. Apart from the names, the situation is as above.
So to answer your questions more directly, think of the ensemble as the set of all the (observed) time series you could possibly see as realizations of a stochastic process, the one you actually observe is just one of them. This is why every member of the ensemble is a possible realization of the process.
The background space is a mathematical construction to define random variables. Another way to think of it would be to say that if we could repeat our observation, we would possibly have seen another time series. Hence, the same stochastic process can give rise to multiple realizations (observed time series and members of the ensemble). The observed time series (members of the ensemble) are the numbers you actually observe, while the stochastic process is a mathematical construction, explaining where the numbers came from. We are often interested in gaining knowledge on the stochastic process but we only have the information in the observed time series.