Bayesian Statistics does not depend on Markov chains (well in theory), Markov chain Monte Carlo is a method for making the computations in Bayesian Statistics easier (doable). Generally we want to generate data from the posterior distribution which we can easily compute parts of, but not always all of (the normalizing coefficient is usually the hard part), but McMC methods will generate data from the posterior without needing to know some of the parts that are harder to find, that is the benifit of McMC with Bayesian stats (though there are some problems where the posterior is known exactly and McMC is not needed at all). It is very rare that the posterior is uniform, so your example in not realistic, but can still help with understanding.
The short answer to length of burn-in and how long to run the chain is "We Don't Know". The little longer (and more useful) answer is to run the chain for a while, then look at it to see if it looks good (yes that is subjective, but experiance helps, and if we knew the exact answer we would know enough not to need McMC), then often run longer to be sure. I have seen cases where the chain was run for quite a few iterations, it looked good, but the researchers decided to run it for 4 times longer to be sure, and near the end sudenly it jumped to a different part of the distribution that had not been covered before.
More dependency does mean that the chain needs to be run for longer (both burn-in and total runs), but as long as the chain has run long enough the dependency no longer matters. Think of a simple random sample with all values independent, now sort the values into order statistics, they are no longer independent, but the mean and sd are the same and estimate the population just as well. The fact that McMC iterations are dependent does not matter so long as there are enough iterations to fully represent the distribution. Your uniform example shows the risks from running for too short of a time.
Best Answer
First, we need to understand what is a Markov chain. Consider the following weather example from Wikipedia. Suppose that weather on any given day can be classified into two states only: sunny and rainy. Based on past experience, we know the following:
$P(\text{Next day is Sunny}\,\vert \,\text{Given today is Rainy)}=0.50$
Since, the next day's weather is either sunny or rainy it follows that:
$P(\text{Next day is Rainy}\,\vert \,\text{Given today is Rainy)}=0.50$
Similarly, let:
$P(\text{Next day is Rainy}\,\vert \,\text{Given today is Sunny)}=0.10$
Therefore, it follows that:
$P(\text{Next day is Sunny}\,\vert \,\text{Given today is Sunny)}=0.90$
The above four numbers can be compactly represented as a transition matrix which represents the probabilities of the weather moving from one state to another state as follows:
$P = \begin{bmatrix} & S & R \\ S& 0.9 & 0.1 \\ R& 0.5 & 0.5 \end{bmatrix}$
We might ask several questions whose answers follow:
Q1: If the weather is sunny today then what is the weather likely to be tomorrow?
A1: Since, we do not know what is going to happen for sure, the best we can say is that there is a $90\%$ chance that it is likely to be sunny and $10\%$ that it will be rainy.
Q2: What about two days from today?
A2: One day prediction: $90\%$ sunny, $10\%$ rainy. Therefore, two days from now:
First day it can be sunny and the next day also it can be sunny. Chances of this happening are: $0.9 \times 0.9$.
Or
First day it can be rainy and second day it can be sunny. Chances of this happening are: $0.1 \times 0.5$.
Therefore, the probability that the weather will be sunny in two days is:
$P(\text{Sunny 2 days from now} = 0.9 \times 0.9 + 0.1 \times 0.5 = 0.81 + 0.05 = 0.86$
Similarly, the probability that it will be rainy is:
$P(\text{Rainy 2 days from now} = 0.1 \times 0.5 + 0.9 \times 0.1 = 0.05 + 0.09 = 0.14$
In linear algebra (transition matrices) these calculations correspond to all the permutations in transitions from one step to the next (sunny-to-sunny ($S_2S$), sunny-to-rainy ($S_2R$), rainy-to-sunny ($R_2S$) or rainy-to-rainy ($R_2R$)) with their calculated probabilities:
On the lower part of the image we see how to calculate the probability of a future state ($t+1$ or $t+2$) given the probabilities (probability mass function, $PMF$) for every state (sunny or rainy) at time zero (now or $t_0$) as simple matrix multiplication.
If you keep forecasting weather like this you will notice that eventually the $n$-th day forecast, where $n$ is very large (say $30$), settles to the following 'equilibrium' probabilities:
$P(\text{Sunny}) = 0.833$
and
$P(\text{Rainy}) = 0.167$
In other words, your forecast for the $n$-th day and the $n+1$-th day remain the same. In addition, you can also check that the 'equilibrium' probabilities do not depend on the weather today. You would get the same forecast for the weather if you start of by assuming that the weather today is sunny or rainy.
The above example will only work if the state transition probabilities satisfy several conditions which I will not discuss here. But, notice the following features of this 'nice' Markov chain (nice = transition probabilities satisfy conditions):
Irrespective of the initial starting state we will eventually reach an equilibrium probability distribution of states.
Markov Chain Monte Carlo exploits the above feature as follows:
We want to generate random draws from a target distribution. We then identify a way to construct a 'nice' Markov chain such that its equilibrium probability distribution is our target distribution.
If we can construct such a chain then we arbitrarily start from some point and iterate the Markov chain many times (like how we forecast the weather $n$ times). Eventually, the draws we generate would appear as if they are coming from our target distribution.
We then approximate the quantities of interest (e.g. mean) by taking the sample average of the draws after discarding a few initial draws which is the Monte Carlo component.
There are several ways to construct 'nice' Markov chains (e.g., Gibbs sampler, Metropolis-Hastings algorithm).