fwiw the medium length version I usually give goes like this:
You want to ask a question of a population but you can't. So you take a sample and ask the question of it instead. Now, how confident you should be that the sample answer is close to the population answer obviously depends on the structure of population. One way you might learn about this is to take samples from the population again and again, ask them the question, and see how variable the sample answers tended to be. Since this isn't possible you can either make some assumptions about the shape of the population, or you can use the information in the sample you actually have to learn about it.
Imagine you decide to make assumptions, e.g. that it is Normal, or Bernoulli or some other convenient fiction. Following the previous strategy you could again learn about how much the answer to your question when asked of a sample might vary depending on which particular sample you happened to get by repeatedly generating samples of the same size as the one you have and asking them the same question. That would be straightforward to the extent that you chose computationally convenient assumptions. (Indeed particularly convenient assumptions plus non-trivial math may allow you to bypass the sampling part altogether, but we will deliberately ignore that here.)
This seems like a good idea provided you are happy to make the assumptions. Imagine you are not. An alternative is to take the sample you have and sample from it instead. You can do this because the sample you have is also a population, just a very small discrete one; it looks like the histogram of your data. Sampling 'with replacement' is just a convenient way to treat the sample like it's a population and to sample from it in a way that reflects its shape.
This is a reasonable thing to do because not only is the sample you have the best, indeed the only information you have about what the population actually looks like, but also because most samples will, if they're randomly chosen, look quite like the population they came from. Consequently it is likely that yours does too.
For intuition it is important to think about how you could learn about variability by aggregating sampled information that is generated in various ways and on various assumptions. Completely ignoring the possibility of closed form mathematical solutions is important to get clear about this.
The median can be bootstrapped and estimation of the median is a good application of the bootstrap. Staudte and Sheather (1990, pp.83-850 described here derive the exact calculation of the bootstrap estimate of the standard error of the estimate of the median that was originally derived in a paper by Maritz and Jarrett in 1978. Details of this can be found on pages 48-50 of my book on the bootstrap here on amazon.com.
Best Answer
There are two methods related to your question. One is the m out of n bootstrap and the other is random subsampling. In his original proposal Efron picked the bootstrap sample size to be the same as the original sample size. There was no specific requirement to do that but the idea was to mimic random sampling from the population as closely as possible. However there are situations where this ordinary bootstrap is inconsistent. Bickel and Ren among others showed that taking a smaller sample size m can lead to consistent results. This works asymptotically with m and n both tending to infinity but at a rate so that m/n goes to 0. Random subsampling was introduced by Hartigan and McCarthy in the late 1960s about a decade before the bootstrap. It uses a procedure of randomly sampling subsets of the original sample. It may be that you could take either of these approaches with your data.
For information on the m out of n bootstrap you can consult either of the following books that I authored/co-authored:
An Introduction to Bootstrap Methods with Applications to R
Bootstrap Methods: A Guide for Practitioners and Researchers
This book by Politis, Romano and Wolf goes into random subsampling in great detail:
Subsampling