Consider the second tentative statement by the OP, slightly modified,
$$\forall \theta\in \Theta, \epsilon>0, \delta>0, S_n, \exists n_0(\theta, \epsilon, \delta): \forall n \geq n_0,\;\\P_n\big[|{\hat \theta(S_{n}}) - \theta^*|\geq \epsilon \big] < \delta \tag{1}$$
We are examining the bounded in $[0,1]$ sequence of real numbers
$$\big\{ P_n\big[|{\hat\theta(S_{n}}) - \theta^*|\geq \epsilon \big]\big\}$$
indexed by $n$. If this sequence has a limit as $n\rightarrow \infty$, call it simply $p$, we will have that
$$\forall \theta\in \Theta, \epsilon>0, \delta>0, S_n,\,\exists n_0(\theta, \epsilon, \delta): \forall n \geq n_0,\;\\\Big| P_n\big[|\hat{\theta(S_{n}}) - \theta^*|\geq \epsilon \big] -p\Big|< \delta \tag{2}$$
So if we assume (or require) $(1)$, we essentially assume (or require) that the limit as $n\rightarrow \infty$ exists and is equal to zero, $p=0$.
So $(1)$ reads "the limit of $P_n\big[|\hat{\theta(S_{n}}) - \theta^*|\geq \epsilon\big]$ as $n\rightarrow \infty$ is $0$". Which is exactly the current definition of consistency (and yes, it covers "all possible samples")
So it appears that the OP essentially proposed an alternative expression for the exact same property, and not a different property, of the estimator.
ADDENDUM (forgot the history part)
In his "Foundations of the Theory of Probability" (1933), Kolmogorov mentions in a footnote that (the concept of convergence in probability)
"...is due to Bernoulli;its completely general treatment was
introduced by E.E.Slutsky".
(in 1925). The work of Slutsky is in German -there may be even an issue of how the German word was translated in English (or the term used by Bernoulli). But don't try to read too much into a word.
Suppose $X_1, X_2, \cdots X_n \stackrel{iid}{\sim} N(\mu, 1)$ and our goal is to estimate $\mu$. Consider the estimator
$$\hat\mu_n = \begin{cases}
\bar X_n, & \text{with probability $\frac{n-1}{n}$} \\[1.2ex]
n, & \text{with probability $\frac{1}{n}$}
\end{cases}$$
By introducing $Y_n \sim Bern\left(\frac{n-1}{n}\right)$ with $Y_i \perp \!\!\! \perp Y_j$ (for all $i\neq j$) and $Y_i \perp \!\!\! \perp X_j$ (for all $i$ and $j$) we can rewrite this estimator as
$$\hat\mu_n = \bar X_n Y_n + n(1-Y_n) .$$
This estimator is simply consistent.
To see this, consider
\begin{align*}
P(|\hat\mu_n - \mu| < \epsilon) &= P(|\bar X_n - \mu| < \epsilon)\frac{n-1}{n} + P(|n-\mu|<\epsilon)\frac{1}{n} \\[1.3ex]
&=\frac{n-1}{n}\left\{\Phi\left(\sqrt n \epsilon \right)- \Phi\left(-\sqrt n \epsilon \right) \right\} + P(|n-\mu|<\epsilon)\frac{1}{n}
\end{align*}
Taking the limit gives
$$\lim_{n\rightarrow\infty} P(|\hat\mu_n - \mu| < \epsilon) = 1,$$
and thus $\hat\mu_n \stackrel{p}{\rightarrow} \mu$.
This estimator is not MSE-consistent
We start by finding the bias.
\begin{align*}
E(\hat\mu_n) &= E(\bar X_n Y_n + n(1-Y_n)) \\[1.2ex]
&= E(\bar X_n)E(Y_n) + nE(1-Y_n) \\[1.2ex]
&= \mu\frac{n-1}{n} + n\frac{n-1}{n} \\[1.2ex]
&= \frac{n-1}{n}\mu + 1
\end{align*}
Therefore $B_\mu(\hat\mu_n) = 1 - \mu/n$ which does not converge to $0$ as $n \rightarrow \infty$. Since $Var(\hat\mu_n)$ must be non-negative this is enough to conclude that $MSE(\hat\mu_n) \not\rightarrow 0$, and therefore the estimator is not MSE-consistent.
Best Answer
If the estimator is not consistent, it won't converge to the true value in probability. In other words, there is always a probability that your estimator and true value will have a difference, no matter how many data points you have. This is actually bad, because even if you collect immense amount of data, your estimate will always have a positive probability of being some $\epsilon>0$ different from the true value. Practically, you can consider this situation as if you're using an estimator of a quantity such that even surveying all the population, instead of a small sample of it, won't help you.