The notion of confidence interval is somewhat intuitive, but that may be keeping you from understanding what it means in more depth.
Say I have multiple samples $x_i$ from a population, and I wish to estimate the population mean $\mu$. A CI of, say, 95\% represents an interval of possible values of $\mu$ such that given my samples, the "probability" that the $\mu$ lies in that interval is 95%.
We immediately see that there can be more than one such interval, since I could trade probability past the upper end for probability at the lower end of the interval, thus shifting the interval. Let's skirt that issue by demanding a symmetric interval about my sample mean.
But the "probability" is not well defined from the information I just presented!
In order to assign a probability, I have to make some assumptions about the population. The usual assumption is that the population variance is equal to the unbiased estimator of variance obtained from our sample. But we still have things backward: We can't honestly talk about the probability of the population mean being in some range, without any assumption about the a priori (before I saw my samples) probabilities of the mean being various values.
So we apply the usual sleight-of-mind logic employed by the frequentist point of view. We ask:
Given that the population variance is our unbiased sample variance estimate, What are the highest and lowest values of the population mean $\mu$ such that the chance of or sample being as far away from $\mu$ as it is, is lower than 100%-95% = 5%.
Now let's go back to your problem. Since the population is finite, as you draw more samples (without replacement) you actually do learn something about the population. If you had drawn all the objects but one, if you take your sample unbiased variance as the population variance, your 95% confidence interval for the value of that remaining one object would be roughly $2\sigma$ but your estimate of the population mean will have a variance of $\sigma/N$. This is quite a bit smaller than would be the case for an infinite population or a small sampling of a large population.
Now when you draw that last sample, you know everything about the distribution. In particular, you know the mean exactly. Therefore any interval that includes the actual mean is a 100% CI. If you then say that the real CI is the tightest such interval, then it has width zero.
The two comments on this question are good. I'll try to flesh them out a bit more into an answer.
You're right that the interpretation of 95% confidence is as follows: if you collected many samples, and from each one generated a different confidence interval, then 95% of the intervals generated would capture the true mean $\mu$ inside them.
So, why is the other interpretation incorrect? It's tricky to see, in part because of the use of the placeholders $a, b$. Let's make this concrete and suppose your 95% interval for $\mu$ is specifically $[2.6, 8.3]$. If someone asked you what the probability that $\mu$ was in $[2.6, 8.3]$ was, your answer should be: "That question makes no sense." ** Remember that $\mu$ is a fixed number, and you just don't have the privilege of knowing which specific number it is. You would never ask something like the probability that $2$ is in the interval $[2.6, 8.3]$, or the probability that $\pi = 3.14159...$ is in the interval $[2.6, 8.3]$. Either the numbers are in the interval, or they aren't.
That's the issue with the interpretation at the top of the question. Once you actually commit to a sample and its resulting confidence interval, it either has $\mu$ inside it, or it doesn't. But the endpoints of the interval are no longer random variables (because you've realized them into actual numbers), and $\mu$ was never a random variable in the first place. It's easy to obfuscate this nuance when you use placeholders like $a, b$ for the endpoints of the interval.
**I'll add the obligatory note: this is all the "frequentist" perspective of statistics. If you want to think of $\mu$ as a random variable that can change, you can do that too. It's called "Bayesian" statistics, but it's usually not taught at introductory levels.
Best Answer
Certainly the probability that the interval you will get contains the population mean is $0.95$, but the conditional probability given the numbers that you got can be different. Here are four examples, one of which (the second) is realistic.
One instance where that would obviously happen is when you know the population mean, so that the conditional probability given what you know about both the population and the sample would be either $0$ orĀ $1$.
A less extreme example is when you have a prior probability distribution for the population mean and your confidence interval falls within some region where the mean is unlikely to be. This can happen in some practical situations.
A more disturbing case goes like this: Suppose you have a sample of size $3$ from a uniform distribution on the interval $[0,A]$ and your confidence interval for the population mean $A/2$ is $[B\bar X,C\bar X]$ where $\bar X$ is the sample mean. I leave it as an exercise to find the correct values of $B$ and $C$ to make this a $95\%$ confidence interval. Now suppose the sample you get is $1,2,99$, so that $\bar X=34$ and the confidence interval is $[34B,34C]$. If I'm not mistaken, the confidence interval excludes $99/2 = 49.5$, but in fact you know that the mean is at least $99/2$ in this case! The data alone tell you that this is one of the other $5\%$. (This one is of course easily remedied by observing that the minimal sufficient statistic is the maximum observed value, and then using a confidence interval of the form $[B'\max,C'\max]$, where $B'$ and $C'$ would both be less than $1$.)
A case that is perhaps even more disturbing is this. Two independent observations are uniformly distributed between $A-1/2$ and $A+1/2$. Call the larger of these $\max$ and the smaller $\min$. Clearly $[\min,\max]$ is a $50\%$ confidence interval for $A$. But if $\max-\min=0.0001$ then you would be a fool if you did not find it highly improbable that $A$ is between them, and if $\max-\min=0.9999$ you would be a fool if you were not nearly $100\%$ sure that $A$ is between them. This technique gives you a $50\%$ coverage rate, but the data tell you whether the instance you've got is likely to be among that $50\%$ or not. (This one also has a standard remedy: Ronald Fisher's technique of conditioning on an ancillary statistic, which in this case is $\max-\min$. You get a more reasonable $50\%$ confidence interval. I don't remember the details, but it is the same as the posterior distribution when you use an improper uniform prior on the real line.)