Perhaps a little more than a hint but here goes...
The question appears to be asking if
$$\begin{align}
p = P\left\{\bar X - 1.96\sqrt\frac{1}{n} \le X_{n+1} \le \bar X + 1.96\sqrt\frac{1}{n} \right\} \\
= P\left\{-1.96\sqrt\frac{1}{n} \le X_{n+1} - \bar X \le 1.96\sqrt\frac{1}{n} \right\} \\
\end{align}$$
is less than, equal to, or greater than $0.95$.
Now, it turns out $X_{n+1} - \bar X$ (itself a linear combination of normal random variables) is also normal, with mean $0$ and variance $1+\frac{1}{n}$ (due to independence of $X_{n+1}$ from the other $X_i$ and hence from $\bar X$).
Then we have that
$P\left\{ -1.96\sqrt{1+\frac{1}{n}} \le X_{n+1} - \bar X \le 1.96\sqrt{1+\frac{1}{n}} \right\} = 0.95
$
as well.
Now take a look at the interval above and notice that
$$
\left[ -1.96\sqrt{1+\frac{1}{n}}, \ \ 1.96\sqrt{1+\frac{1}{n}} \ \right]
= \left[ -1.96\sqrt{1+\frac{1}{n}}, \ \ -1.96\sqrt{\frac{1}{n}} \ \right) \\
\bigcup \color{red}{ \left[ -1.96\sqrt{\frac{1}{n}}, \ \ 1.96\sqrt{\frac{1}{n}} \ \right]} \\
\bigcup \left( 1.96\sqrt{\frac{1}{n}}, \ \ 1.96\sqrt{1+\frac{1}{n}} \ \right]
$$
Can you deduce where $p$ stands relative to 0.95?
While the article you refer to correctly defines the concept of confidence interval (your highlighted text) it does not correctly treat the case of a normal distribution with unknown standard deviation. You may want to search "Neyman confidence interval" to see an approach that produces confidence intervals with the property you highlighted.
The Neyman procedure selects a region containing 95% of outcomes, for each true value of the parameter of interest. The confidence interval is the union of all parameter values for which the observation is within the selected region. The probability for the observation to be within the selected region for the true parameter value is 95%, and only for those observations, will the confidence interval contain the true value. Therefore the procedure guarantees the property you highlight.
If the standard deviation is known and not a function of the mean, the Neyman central confidence intervals turn out to be identical to those described in the article.
Thank you for the link to Neyman's book - interesting to read from the original source! You ask for a simple description, but that is what my second paragraph was meant to be. Perhaps a few examples will help illustrate: Example 1 and 1b could be considered trivial, whereas 2 would not be handled correctly by the article you refer to.
Example 1. Uniform random variable. Let X follow a uniform distribution,
$$f(x)=1/2 {\mathrm{\ \ for\ \ }}\theta-1\le x\le \theta+1 $$ and zero otherwise.
We can make a 100% confidence interval for $\theta$ by considering all possible outcomes $x$, given $\theta$, ie. $x \in [\theta-1,\theta+1]$. Now consider an observed value, $x_0$. The union of all possible values of $\theta$ for which $x_0$ is a possible outcome is $[x_0-1,x_0+1]$. That is the 100% confidence interval for $\theta$ for this problem.
Example 1b. Uniform random variable. Let X follow the same uniform distribution. We can make a 95% central confidence interval for $\theta$ by selecting the 95% central outcomes $x$, given $\theta$, ie. $x \in [\theta-0.95,\theta+0.95]$. Now consider an observed value, $x_0$. The union of all possible values of $\theta$ for which $x_0$ is within the selected range is $[x_0-0.95,x_0+0.95]$. That is the 95% confidence interval for $\theta$ for this problem.
Example 2. Uniform random variable. Let X follow a uniform distribution,
$$f(x)=1/\theta {\mathrm{\ \ for\ \ }}{1\over2}\theta \le x \le {3\over2}\theta $$ and zero otherwise. We can make a 100% confidence interval for $\theta$ by considering all possible outcomes $x$, given $\theta$, ie. $x \in [{1\over2}\theta,{3\over2}\theta]$. Now consider an observed value, $x_0$. The union of all possible values of $\theta$ for which $x_0$ is a possible outcome is $[{2\over3}x_0,2x_0]$. That is the 100% confidence interval for $\theta$ for this problem. (You can confirm this by inserting the endpoints of the confidence interval into the pdf and see they are at the boundaries of the pdf). Note that the central confidence interval is not centered on the point estimate for $\theta$, $\hat\theta = x_0$.
Example 3. Normal distribution with mean $\theta$ and standard deviation $1$. The 68% central confidence interval would be constructed identically to example 1, that is the selected region for $X$ would be $[\theta-1,\theta+1]$. The 68% central confidence interval is therefore the same as in Example 1, $[x_0-1,x_0+1]$. You can extend this to 95% and arbitrary KNOWN standard deviation $\sigma$ to be $[x_0-1.96\sigma,x_0+1.96\sigma]$.
Example 4. Normal distribution with mean $\theta$ and standard deviation $\theta/2$. The 68% central confidence interval would be constructed identically to example 2. The 68% central confidence interval for $\theta$ is therefore the same as in Example 2, $[{2\over3}x_0,2x_0]$.
The authors of the article you refer to and the other commenters to your question would not get Example 2 or 4 right. Only following a procedure like Neyman's will the confidence interval have the property that you highlighted in your post. The other methods are approximations for the general problem of building confidence intervals.
The exact solution to the problem with a normal distribution and UNKNOWN standard deviation is more difficult to work out than the examples above.
Best Answer
Confidence intervals are great for illustrating the difference between epistemic and aleatory uncertainty.
Before you collect your sample, the probability statement is an aleatory statement -- that is, it pertains to the actual, repeated sampling (frequentist) probability that the (still TBD and random) interval will contain the true value of the parameter.
After you collect the sample and form your interval, you no longer have any aleatory uncertainty (we have our sample now and our interval -- all random values are now known). The resulting interval either contains the true parameter or it does not, we just don't know which one is true. So in what sense should we care about this actual interval?
This is where epistemic uncertainty comes in. We know the aleatory/objective probability of the interval containing the true parameter is either 0 or 1. But we don't know which one! Therefore, the uncertainty is no longer in the values themselves, but our knowledge. Given this, the post-sampling "confidence" is an epistemic statement (whereas pre-sample it was an actual probability statement).
So, for a 95% CI, we know that 95% of intervals formed this way will contain the true parameter; therefore, we should lean toward believing this interval contains the true parameter, accepting the fact that 5% of such intervals will actually not contain it (i.e., be misleading).
Bottom line: pre-sampling, confidence is a true/aleatory probability. Post-sampling it cannot be interpreted as a frequentist probability, but it is valid to use Confidence as a measure of how strongly you should believe the interval is accurate.