Short answers:
1. It simplifies it. (Frankly, I did not get the question).
2. No, you can never ignore it, as lack of i.i.d. has immediate consequences on the variances of whatever you are estimating.
Medium answer: Pretty much the central issue with the bootstrap is, 'Does the proposed procedure reproduce the features of the data?'. Violation of the i.i.d. assumption is a big deal: your data are dependent, you (most likely) have less information in your data than you would have in an i.i.d. sample of the same size, and if you run a naive bootstrap (resample the individual observations), the standard errors you get from it will be too small. The proposed procedure circumvents the problem of lack of independence by capturing (or at least attempting to capture) the dependence in the model structure and parameters. If successful, each bootstrap sample would reproduce the features of the data, as needed.
Long answer: There are multiple layers of assumptions concerning the bootstrap, and even in the simplest possible case (i.i.d. data, estimation of the mean), you have to make at least three assumptions: (1) the statistic of interest is a smooth function of the data (true in the case of the mean, not so true even in the case of percentiles, totally off with say nearest neighbor matching estimators); (2) the distribution from which you bootstrap is "close" to the population distribution (works OK in the case of i.i.d. data; may not work OK in the case of dependent data, where you essentially have only one trajectory = one observation in the case of time series, and you have to invoke additional assumptions like stationarity and mixing to strecth this single observation into a quasi-population); (3) your Monte Carlo bootstrap sampling is a good enough approximation to the complete bootstrap with all possible subsamples (the inaccuracy from using Monte Carlo vs. the complete bootstrap is much less than the uncertainty you are trying to capture). In the case of the parametric bootstrap, you also make an assumption that (4) your model perfectly explains all the features of the data.
As a warning of what could go wrong with (4), think about regression with heteroskedastic errors: $y=x\beta + \epsilon$, Var$[\epsilon] = \exp[ x\gamma]$, say. If you fit an OLS model and resample the residuals as if they were i.i.d., you will get a wrong answer (some sort of $\bar\sigma^2 (X'X)^{-1}$ where $\bar\sigma^2$ is the average $1/n \sum_i \exp[x_i \gamma]$, instead of the appropriate $(X'X)^{-1} \sum \exp[x_i \gamma] x_i x_i' (X'X)^{-1}$). So if you wanted to have a fully parametric bootstrap solution, you would've have to fit the model for heteroskedasticity along with the model for the mean. And if you suspect serial or other sort of correlation, you would've have to fit the model for that, too. (See, the non-parametric distribution-free flavor of the bootstrap is pretty much gone for now, as you have replaced the voice of the data with the synthesized voice of your model.)
The method you described works around the i.i.d. assumption by creating a whole new sample. The greatest problem with the dependent data bootstrap is to create the sample that would have the dependence patterns that would be sufficiently close to those in the original data. With time series, you could use block bootstraps; with clustered data, you bootstrap the whole clusters; with heteroskedastic regression, you have to with wild bootstraps (which is a better idea than the bootstrap of residuals, even if you have fitted a heteroskedasticty model to it). In the block bootstrap, you have to make an educated guess (or, in other words, have good reasons to believe) that distant parts of time series are approximately independent, so that all of the correlation structure is captured by the adjacent 5 or 10 observtations that form the block. So instead of resampling observations one by one, which totally ignores the correlation structure of the time-series, you resampling them in blocks, hoping that this would respect the correlation structure. The parametric bootstrap you referred to says: "Rather than fiddling with the data and assembling the new dolls from the pieces of the old ones, why don't I just stamp the whole molded Barbie for you instead? I've figured out what kind of Barbies you like, and I promise I will make you one you'd like, too."
In case of the parametric bootstrap you described, you have to be pretty damn sure that your HMM model fit is pretty much perfect, otherwise your parametric bootstrap may lead to incorrect results (Barbies that cannot move their arms). Think about the above heteroskedastic regession example; or think about fitting an AR(1) model to AR(5) data: whatever you do with the parametrically simulated data, they won't have the structure the original data used to have.
Edit: as Sadeghd clarified his question, I can respond to that, as well. There is a humongous variety of the bootstrap procedures, each addressing the particular quirk in either the statistic, the sample size, the dependence, or whatever an issue with the bootstrap could be. There is no single way to address dependence, for instance. (I've worked with survey bootstraps, there are about 8 different procedures, although some are mostly of methodological rather than practical interest; and some are clearly inferior in that they are only applicable in special, not easily generalizable, cases.) For a general discussion of issues you could face with the bootstrap, see Canty, Davison, Hinkley and Ventura (2006). Bootstrap diagnostics and remedies. The Canadian Journal of Statistics, 34 (1), 5-27.
How you prove this depends on your statistic $\theta$; there are complicated and straightforward versions depending on how difficult $\theta$ is. Fundamentally, though, the bootstrap works by the delta method; there's no need for the Berry-Esseen theorem.
A simplifying fact is that $\hat\theta$ is going to have a Normal distribution, so getting the tail probabilities asymptotically correct just implies getting the asymptotic mean and variance parameters correct
Often, your statistic $\hat\theta$ will be a differentiable function of a mean. In that case, you prove that the bootstrap is correct for the mean, and it transfers to being correct for the statistic automatically. Or your $\hat\theta$ solves
$$\frac{1}{n}\sum_{i=1}^n U_i(\theta)=0$$
and $U_i$ has some regularity conditions that make $\hat\theta$ asymptotically Normal. Again, you show the bootstrap is correct for the mean, and the same arguments that make $\hat\theta$ asymptotically Normal also show the bootstrap is correct.
There's a more general argument that requires more background maths. If your data are from an iid sequence in $\mathbb{R}^d,$ then the empirical CDF is asymptotically Normal:
$$\sqrt{n}(\mathbb{F}_n-F)\stackrel{w}{\to} Z$$
where $Z$ is a Gaussian process indexed by $\mathbb{R}^d.$ The delta method then says that any suitably differentiable function $\theta(\mathbb{F}_n)$ is asymptotically Normal. Since the bootstrap is correct for $\mathbb{F}_n$, in the sense that
$$\sqrt{n}(\mathbb{F}^*_n-\mathbb{F})\stackrel{w}{\to} Z$$
for the same limiting $Z$ (for almost all data sequences), the delta method also says the bootstrap is correct.
A good source for this with the details is Chapter 23 of Asymptotic Statistics by van der Vaart. He starts off with the mean, and then the delta-method in finite-dimensional cases, and then the infinite-dimensional case. Finally, he talks about higher-order accuracy for sufficiently well-behaved statistics through the Edgeworth expansion.
Best Answer
This is more complicated than it sounds, so is a good question.
To start off, the bootstrap doesn't work for $\hat\alpha^2$ when $\alpha=0$, which is a classic example of bootstrap failure. The reason it doesn't work is that it relies on the delta-method, which doesn't exactly fail but does become much less helpful when the function you're computing has derivative zero at the true parameters (but not at the 'true' parameters in the bootstrap world)
(Here I mean the function $(a,b)\mapsto ab$)
So, this deserves simulation. Here are qqplots for the sampling distribution and four examples of the bootstrap distribution in samples of size 1000 from some Normal distributions
At (0,0) the shape is qualitatively right, but the bootstrap distributions are further than you might expect from the sampling distribution. At (10,10) and (10,0) everything is fine: normality, good bootstrap approximation. At (0.1, 0) it's starting to go bad again.
When does it work?
If $\alpha$ and $\beta$ are both not near zero then we have $\sqrt{n}(\hat\alpha-\alpha)$ and $\sqrt{n}(\hat\beta-\beta)$ both asymptotically Normal and the bootstrap is correct for each. The product function is differentiable at $(\alpha,\beta)$ with non-zero derivative, and everything is fine.
If $\alpha=0$ and $\beta$ is large, the well-behaved delta-method term $$(\hat\alpha-\alpha)\frac{\partial(\alpha\beta)}{\partial \alpha}=(\hat\alpha-\alpha)\beta$$ dominates the small badly-behaved term and everything is still fine (and similarly if $\beta=0$ and $\alpha$ is large)
But if $\alpha=0$ and $\beta$ is small or vice versa, you get breakdown in the bootstrap and no asymptotic Normality.
Extra credit
Things get even worse with non-zero correlation. Here I take $X\sim N(0,1)$ and $Y=X+N(0,1)$
or with a qqplot of
r
vss
, where you can see the $X^2$ component of $XY$ behaving differently from the error component.