Your definitions appear to be correct.
The book to consult about these matters is Statistical Intervals (Gerald Hahn & William Meeker), 1991. I quote:
A prediction interval for a single future observation is an interval that will, with a specified degree of confidence, contain the next (or some other prespecified) randomly selected observation from a population.
[A] tolerance interval is an interval that one can claim to contain at least a specified proportion, p, of the population with a specified degree of confidence, $100(1-\alpha)\%$.
Here are restatements in standard mathematical terminology. Let the data $\mathbf{x}=(x_1,\ldots,x_n)$ be considered a realization of independent random variables $\mathbf{X}=(X_1,\ldots,X_n)$ with common cumulative distribution function $F_\theta$. ($\theta$ appears as a reminder that $F$ may be unknown but is assumed to lie in a given set of distributions ${F_\theta \vert \theta \in \Theta}$). Let $X_0$ be another random variable with the same distribution $F_\theta$ and independent of the first $n$ variables.
A prediction interval (for a single future observation), given by endpoints $[l(\mathbf{x}), u(\mathbf{x})]$, has the defining property that
$$ \inf_\theta\{{\Pr}_\theta(X_0 \in [l(\mathbf{X}), u(\mathbf{X})])\}= 100(1-\alpha)\%.$$
Specifically, ${\Pr}_\theta$ refers to the $n+1$ variate distribution of $(X_0, X_1, \ldots, X_n)$ determined by the law $F_\theta$. Note the absence of any conditional probabilities: this is a full joint probability. Note, too, the absence of any reference to a temporal sequence: $X_0$ very well may be observed in time before the other values. It does not matter.
I'm not sure which aspect(s) of this may be "counterintuitive." If we conceive of selecting a statistical procedure as an activity to be pursued before collecting data, then this is a natural and reasonable formulation of a planned two-step process, because both the data ($X_i, i=1,\ldots,n$) and the "future value" $X_0$ need to be modeled as random.
A tolerance interval, given by endpoints $(L(\mathbf{x}), U(\mathbf{x})]$, has the defining property that
$$ \inf_\theta\{{\Pr}_\theta\left(F_\theta(U(\mathbf{X})) - F_\theta(L(\mathbf{X})\right) \ge p)\} = 100(1-\alpha)\%.$$
Note the absence of any reference to $X_0$: it plays no role.
When $\{F_\theta\}$ is the set of Normal distributions, there exist prediction intervals of the form
$$l(\mathbf{x}) = \bar{x} - k(\alpha, n) s, \quad u(\mathbf{x}) = \bar{x} + k(\alpha, n) s$$
($\bar{x}$ is the sample mean and $s$ is the sample standard deviation). Values of the function $k$, which Hahn & Meeker tabulate, do not depend on the data $\mathbf{x}$. There are other prediction interval procedures, even in the Normal case: these are not the only ones.
Similarly, there exist tolerance intervals of the form
$$L(\mathbf{x}) = \bar{x} - K(\alpha, n, p) s, \quad U(\mathbf{x}) = \bar{x} + K(\alpha, n, p) s.$$
There are other tolerance interval procedures: these are not the only ones.
Noting the similarity among these pairs of formulas, we may solve the equation
$$k(\alpha, n) = K(\alpha', n, p).$$
This allows one to reinterpret a prediction interval as a tolerance interval (in many different possible ways by varying $\alpha'$ and $p$) or to reinterpret a tolerance interval as a prediction interval (only now $\alpha$ usually is uniquely determined by $\alpha'$ and $p$). This may be one origin of the confusion.
Best Answer
This will depend on what you want to use the forecast and the interval for.
For instance, I do forecasting for retail, and our prediction intervals (more precisely: high quantile forecasts) are used for replenishment. In such a use case, the relevant thing is, say, a 95% quantile forecast, because this will translate more-or-less into a specific service level, which will hopefully balance the costs of understock and overstock.
As an example, let's look at the
AirPassengers
dataset. Maybe we want to provide enough capacity to achieve a service level of 95%. We can do this by fitting a model and looking at a 90% prediction interval. This is comprised of the 5% and the 95% quantile forecasts. Below, the "Lo 90%" is the number such that we expect a 5% chance of observing fewer passengers, while the "Hi 90%" is the number such that we expect a 95% chance of observing fewer passengers - together, the two numbers bracket off a prediction interval that we expect a 90% chance of covering the next observation.So, to achieve 95% service level, we would plan for providing capacity for 466.1 (thousand) passengers. (I'm glossing over different possible definitions of the service level, which here don't really make a difference. Plus, in planning for a longer lead time, we would need to take remaining autocorrelation into account, and so forth.)
Most forecasters I work with are only interested in prediction intervals, because you can verify them to a certain extent, simply by observing the next realization. Confidence intervals can never be verified, and you will have to trust that you have a correctly specified model for the mathematics to work. (And I have never seen the tolerance interval used in forecasting.)
When you are predicting the winner of an election, a prediction interval does not really make sense, because this random variable is not numeric. Unless you are not predicting the winner, but, say, a candidate's share of the vote.