Consider the second tentative statement by the OP, slightly modified,
$$\forall \theta\in \Theta, \epsilon>0, \delta>0, S_n, \exists n_0(\theta, \epsilon, \delta): \forall n \geq n_0,\;\\P_n\big[|{\hat \theta(S_{n}}) - \theta^*|\geq \epsilon \big] < \delta \tag{1}$$
We are examining the bounded in $[0,1]$ sequence of real numbers
$$\big\{ P_n\big[|{\hat\theta(S_{n}}) - \theta^*|\geq \epsilon \big]\big\}$$
indexed by $n$. If this sequence has a limit as $n\rightarrow \infty$, call it simply $p$, we will have that
$$\forall \theta\in \Theta, \epsilon>0, \delta>0, S_n,\,\exists n_0(\theta, \epsilon, \delta): \forall n \geq n_0,\;\\\Big| P_n\big[|\hat{\theta(S_{n}}) - \theta^*|\geq \epsilon \big] -p\Big|< \delta \tag{2}$$
So if we assume (or require) $(1)$, we essentially assume (or require) that the limit as $n\rightarrow \infty$ exists and is equal to zero, $p=0$.
So $(1)$ reads "the limit of $P_n\big[|\hat{\theta(S_{n}}) - \theta^*|\geq \epsilon\big]$ as $n\rightarrow \infty$ is $0$". Which is exactly the current definition of consistency (and yes, it covers "all possible samples")
So it appears that the OP essentially proposed an alternative expression for the exact same property, and not a different property, of the estimator.
ADDENDUM (forgot the history part)
In his "Foundations of the Theory of Probability" (1933), Kolmogorov mentions in a footnote that (the concept of convergence in probability)
"...is due to Bernoulli;its completely general treatment was
introduced by E.E.Slutsky".
(in 1925). The work of Slutsky is in German -there may be even an issue of how the German word was translated in English (or the term used by Bernoulli). But don't try to read too much into a word.
Best Answer
To define the two terms without using too much technical language:
An estimator is consistent if, as the sample size increases, the estimates (produced by the estimator) "converge" to the true value of the parameter being estimated. To be slightly more precise - consistency means that, as the sample size increases, the sampling distribution of the estimator becomes increasingly concentrated at the true parameter value.
An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of the sampling distribution of the estimator is equal to the true parameter value.
The two are not equivalent: Unbiasedness is a statement about the expected value of the sampling distribution of the estimator. Consistency is a statement about "where the sampling distribution of the estimator is going" as the sample size increases.
It certainly is possible for one condition to be satisfied but not the other - I will give two examples. For both examples consider a sample $X_1, ..., X_n$ from a $N(\mu, \sigma^2)$ population.
Unbiased but not consistent: Suppose you're estimating $\mu$. Then $X_1$ is an unbiased estimator of $\mu$ since $E(X_1) = \mu$. But, $X_1$ is not consistent since its distribution does not become more concentrated around $\mu$ as the sample size increases - it's always $N(\mu, \sigma^2)$!
Consistent but not unbiased: Suppose you're estimating $\sigma^2$. The maximum likelihood estimator is $$ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (X_i - \overline{X})^2 $$ where $\overline{X}$ is the sample mean. It is a fact that $$ E(\hat{\sigma}^2) = \frac{n-1}{n} \sigma^2 $$ which can be derived using the information here. Therefore $\hat{\sigma}^2$ is biased for any finite sample size. We can also easily derive that $${\rm var}(\hat{\sigma}^2) = \frac{ 2\sigma^4(n-1)}{n^2}$$ From these facts we can informally see that the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases since the mean is converging to $\sigma^2$ and the variance is converging to $0$. (Note: This does constitute a proof of consistency, using the same argument as the one used in the answer here)