Short answers:
1. It simplifies it. (Frankly, I did not get the question).
2. No, you can never ignore it, as lack of i.i.d. has immediate consequences on the variances of whatever you are estimating.
Medium answer: Pretty much the central issue with the bootstrap is, 'Does the proposed procedure reproduce the features of the data?'. Violation of the i.i.d. assumption is a big deal: your data are dependent, you (most likely) have less information in your data than you would have in an i.i.d. sample of the same size, and if you run a naive bootstrap (resample the individual observations), the standard errors you get from it will be too small. The proposed procedure circumvents the problem of lack of independence by capturing (or at least attempting to capture) the dependence in the model structure and parameters. If successful, each bootstrap sample would reproduce the features of the data, as needed.
Long answer: There are multiple layers of assumptions concerning the bootstrap, and even in the simplest possible case (i.i.d. data, estimation of the mean), you have to make at least three assumptions: (1) the statistic of interest is a smooth function of the data (true in the case of the mean, not so true even in the case of percentiles, totally off with say nearest neighbor matching estimators); (2) the distribution from which you bootstrap is "close" to the population distribution (works OK in the case of i.i.d. data; may not work OK in the case of dependent data, where you essentially have only one trajectory = one observation in the case of time series, and you have to invoke additional assumptions like stationarity and mixing to strecth this single observation into a quasi-population); (3) your Monte Carlo bootstrap sampling is a good enough approximation to the complete bootstrap with all possible subsamples (the inaccuracy from using Monte Carlo vs. the complete bootstrap is much less than the uncertainty you are trying to capture). In the case of the parametric bootstrap, you also make an assumption that (4) your model perfectly explains all the features of the data.
As a warning of what could go wrong with (4), think about regression with heteroskedastic errors: $y=x\beta + \epsilon$, Var$[\epsilon] = \exp[ x\gamma]$, say. If you fit an OLS model and resample the residuals as if they were i.i.d., you will get a wrong answer (some sort of $\bar\sigma^2 (X'X)^{-1}$ where $\bar\sigma^2$ is the average $1/n \sum_i \exp[x_i \gamma]$, instead of the appropriate $(X'X)^{-1} \sum \exp[x_i \gamma] x_i x_i' (X'X)^{-1}$). So if you wanted to have a fully parametric bootstrap solution, you would've have to fit the model for heteroskedasticity along with the model for the mean. And if you suspect serial or other sort of correlation, you would've have to fit the model for that, too. (See, the non-parametric distribution-free flavor of the bootstrap is pretty much gone for now, as you have replaced the voice of the data with the synthesized voice of your model.)
The method you described works around the i.i.d. assumption by creating a whole new sample. The greatest problem with the dependent data bootstrap is to create the sample that would have the dependence patterns that would be sufficiently close to those in the original data. With time series, you could use block bootstraps; with clustered data, you bootstrap the whole clusters; with heteroskedastic regression, you have to with wild bootstraps (which is a better idea than the bootstrap of residuals, even if you have fitted a heteroskedasticty model to it). In the block bootstrap, you have to make an educated guess (or, in other words, have good reasons to believe) that distant parts of time series are approximately independent, so that all of the correlation structure is captured by the adjacent 5 or 10 observtations that form the block. So instead of resampling observations one by one, which totally ignores the correlation structure of the time-series, you resampling them in blocks, hoping that this would respect the correlation structure. The parametric bootstrap you referred to says: "Rather than fiddling with the data and assembling the new dolls from the pieces of the old ones, why don't I just stamp the whole molded Barbie for you instead? I've figured out what kind of Barbies you like, and I promise I will make you one you'd like, too."
In case of the parametric bootstrap you described, you have to be pretty damn sure that your HMM model fit is pretty much perfect, otherwise your parametric bootstrap may lead to incorrect results (Barbies that cannot move their arms). Think about the above heteroskedastic regession example; or think about fitting an AR(1) model to AR(5) data: whatever you do with the parametrically simulated data, they won't have the structure the original data used to have.
Edit: as Sadeghd clarified his question, I can respond to that, as well. There is a humongous variety of the bootstrap procedures, each addressing the particular quirk in either the statistic, the sample size, the dependence, or whatever an issue with the bootstrap could be. There is no single way to address dependence, for instance. (I've worked with survey bootstraps, there are about 8 different procedures, although some are mostly of methodological rather than practical interest; and some are clearly inferior in that they are only applicable in special, not easily generalizable, cases.) For a general discussion of issues you could face with the bootstrap, see Canty, Davison, Hinkley and Ventura (2006). Bootstrap diagnostics and remedies. The Canadian Journal of Statistics, 34 (1), 5-27.
Yes. You are right. But Parametric bootstrap shields better results when the assumptions hold. Think of it this way:
We have a random sample $X_1, \ldots, X_n$ from a distribution $F$. We estimate a parameter of interest $\theta$ as a function of the sample, $\hat{\theta} = h (X_1, \ldots, X_n)$. This estimate is a random variable, so it has a distribution we call $G$. This distribution is fully determined by $h$ and $F$ meaning $G=G(h,F)$. When doing any kind of bootstrap (parametric, non-parametric, re-sampling) what we are doing is to estimate $F$ with $\hat{F}$ in order to get an estimate of $G$, $\hat G = G(h,\hat{F})$. From $\hat G$ we estimate the properties of $\hat \theta$. What changes fom differents types of bootstrap is how we get $\hat{F}$.
If you can analytically calculate $\hat{G} = G(h,\hat{F})$ you should go for it, but in general, it is a rather hard thing to do. The magic of bootstrap is that we can generate samples with distribution $\hat G$.
To do this, we generate random samples $X^b_1, \ldots, X^b_n$ with distribution $\hat F$ and calculate $\hat {\theta}^b = h(X^b_1, \ldots, X^b_n)$ which will follow the $\hat G$ distribution.
Once you think of it this way, the advantages of parametric bootstrap are obvious. $\hat{F}$ would be a better approximation of $F$, then $\hat{G}$ would be closer to $G$ and finally the estimations of $\hat{\theta}$'s properties would be better.
Best Answer
The answer given by miura is not entirely accurate so I am answering this old question for posterity:
(2). These are very different things. The empirical cdf is an estimate of the CDF (distribution) which generated the data. Precisely, it is the discrete CDF which assigns probability $1/n$ to each observed data point, $\hat{F}(x) = \frac{1}{n}\sum_{i=1}^n I(X_i\leq x)$, for each $x$. This estimator converges to the true cdf: $\hat{F}(x) \to F(x) = P(X_i\leq x)$ almost surely for each $x$ (in fact uniformly).
The sampling distribution of a statistic $T$ is instead the distribution of the statistic you would expect to see under repeated experimentation. That is, you perform your experiment once and collect data ${X_1,\ldots,X_n}$. $T$ is a function of your data: $T = T(X_1,\ldots,X_n)$. Now, suppose you repeat the experiment, and collect data ${X'_1,\ldots,X'_n}$. Recalculating T on the new sample gives $T' = T({X'_1,\ldots,X'_n})$. If we collected 100 samples we would have 100 estimates of $T$. These observations of $T$ form the sampling distribution of $T$. It is a true distribution. As the number of experiments goes to infinity its mean converges to $E(T)$ and its variance to $Var(T)$.
In general of course we don't repeat experiments like this, we only ever see one instance of $T$. Figuring out what the variance of $T$ is from a single observation is very difficult if you don't know the underlying probability function of $T$ a priori. Bootstrapping is a way to estimate that sampling distribution of $T$ by artificially running "new experiments" on which to calculate new instances of $T$. Each new sample is actually just a resample from the original data. That this provides you with more information than you have in the original data is mysterious and totally awesome.
(1). You are correct--you would not do this. The author is trying to motivate the parametric bootstrap by describing it as doing "what you would do if you knew the distribution" but substituting a very good estimator of the distribution function--the empirical cdf.
For example, suppose you know that your test statistic $T$ is normally distributed with mean zero, variance one. How would you estimate the sampling distribution of $T$? Well, since you know the distribution, a silly and redundant way to estimate the sampling distribution is to use R to generate 10,000 or so standard normal random variables, then take their sample mean and variance, and use these as our estimates of the mean and variance of the sampling distribution of $T$.
If we don't know a priori the parameters of $T$, but we do know that it's normally distributed, what we can do instead is generate 10,000 or so samples from the empirical cdf, calculate $T$ on each of them, then take the sample mean and variance of these 10,000 $T$s, and use them as our estimates of the expected value and variance of $T$. Since the empirical cdf is a good estimator of the true cdf, the sample parameters should converge to the true parameters. This is the parametric bootstrap: you posit a model on the statistic you want to estimate. The model is indexed by a parameter, e.g. $(\mu, \sigma)$, which you estimate from repeated sampling from the ecdf.
(3). The nonparametric bootstrap doesn't even require you to know a priori that $T$ is normally distributed. Instead, you simply draw repeated samples from the ecdf, and calculate $T$ on each one. After you've drawn 10,000 or so samples and calculated 10,000 $T$s, you can plot a histogram of your estimates. This is a visualization of the sampling distribution of $T$. The nonparametric bootstrap won't tell you that the sampling distribution is normal, or gamma, or so on, but it allows you to estimate the sampling distribution (usually) as precisely as needed. It makes fewer assumptions and provides less information than the parametric bootstrap. It is less precise when the parametric assumption is true but more accurate when it is false. Which one you use in each situation you encounter depends entirely on context. Admittedly more people are familiar with the nonparametric bootstrap but frequently a weak parametric assumption makes a completely intractable model amenable to estimation, which is lovely.