fwiw the medium length version I usually give goes like this:
You want to ask a question of a population but you can't. So you take a sample and ask the question of it instead. Now, how confident you should be that the sample answer is close to the population answer obviously depends on the structure of population. One way you might learn about this is to take samples from the population again and again, ask them the question, and see how variable the sample answers tended to be. Since this isn't possible you can either make some assumptions about the shape of the population, or you can use the information in the sample you actually have to learn about it.
Imagine you decide to make assumptions, e.g. that it is Normal, or Bernoulli or some other convenient fiction. Following the previous strategy you could again learn about how much the answer to your question when asked of a sample might vary depending on which particular sample you happened to get by repeatedly generating samples of the same size as the one you have and asking them the same question. That would be straightforward to the extent that you chose computationally convenient assumptions. (Indeed particularly convenient assumptions plus non-trivial math may allow you to bypass the sampling part altogether, but we will deliberately ignore that here.)
This seems like a good idea provided you are happy to make the assumptions. Imagine you are not. An alternative is to take the sample you have and sample from it instead. You can do this because the sample you have is also a population, just a very small discrete one; it looks like the histogram of your data. Sampling 'with replacement' is just a convenient way to treat the sample like it's a population and to sample from it in a way that reflects its shape.
This is a reasonable thing to do because not only is the sample you have the best, indeed the only information you have about what the population actually looks like, but also because most samples will, if they're randomly chosen, look quite like the population they came from. Consequently it is likely that yours does too.
For intuition it is important to think about how you could learn about variability by aggregating sampled information that is generated in various ways and on various assumptions. Completely ignoring the possibility of closed form mathematical solutions is important to get clear about this.
Rather than representing problem in the bootstrap, this feature is sometimes used to estimate the bias in your original estimator, see for example chapter 10 of Bradley Efron and Robert Tibshirani (1993) "An Introduction to the Bootstrap". Chapman & Hall/CRC.
Best Answer
Welcome to CV!
In bootstrapping, you repeatedly take samples with replacement from the original sample. The general idea behind this is that if you can estimate the uncertainty in your sample by asking the question: What if I didn't observe this observation, or that one, or if I observed this observation more than once?
You do this, say, $B = 1,000$ times, and end up with $1,000$ slightly different estimates of your statistic of interest. Depending on how strongly the calculated statistic is affected by this, the variance of your bootstrapped statistic will be larger.
In fact, it turns out that the standard deviation of the bootstrapped statistic can be a really good estimator of the standard error of your statistic.
And so, by simply randomly resampling our original sample with replacement, over and over, we have obtained an idea of how precise the estimate is, given that we only have a sample of the population.
Of course, if you can measure the entire population, then there is no point in bootstrapping.