Binary branching is just a simplification to make it easier to explain without math. The actual math is very simple, and can handle unequal probabilities.
At the simplest level, a branching occurs when you can write the wavefunction as a sum
$$|\psi \rangle = |\psi_1 \rangle + |\psi_2 \rangle$$
where $|\psi_1 \rangle$ and $|\psi_2 \rangle$ are orthogonal and decohered, i.e. that there is no reasonable physical process that can make them overlap again. In this case we colloquially describe the two terms as "worlds" or "branches", and the probability of being in each one is the norm $\langle \psi_i | \psi_i \rangle$, which can be an arbitrary number between zero and one. The same logic goes for branching into more than two "worlds" at once, and repeated branching: you just get a sum of many terms, and the probability of each one is its norm.
After some comments, I get the feeling you really want a discussion of where the probability in the many worlds interpretation "comes from". Again, this is a very subjective and debatable thing, but my favorite take on it is "self-locating uncertainty".
Suppose that somebody kidnaps you, blindfolds you, and takes you somewhere in Uzbekistan. When you come to your senses, are you closer to Samarkand than Tashkent? You don't know for sure, so you can only answer in terms of probabilities. This is self-locating uncertainty: you're certainly in a definite place, and it's not like there are many copies of you running around, but there's probability nonetheless. You can use a variety of information to help. For example, if you weight by area, about 85% of the country is closer to Samarkand. (But this doesn't mean there are $85$ copies of you near Samarkand and $15$ copies of you near Tashkent!) But if you weight by population, substantially more of the population is closer to Tashkent, because it's the capital. Of course, which weighting is the correct choice depends on how the kidnappers set things up.
Now, suppose that after the spin of a particle is measured by a device, the state is
$$|\psi \rangle = \sqrt{0.85} |\text{spin up measured} \rangle + \sqrt{0.15} |\text{spin down measured} \rangle.$$
You are living in one and only one branch of the wavefunction, but until you look at what the device is reading, you don't know which. At best, you can assign probabilities. The core assumption of many worlds is that the correct choice of probability (i.e. the choice that corresponds to what you actually observe, when averaged over many measurements) is to take the coefficient of each branch and take its norm squared, i.e. to assign an 85% chance to observing spin up.
If you ask where this assumption comes from, it's a perfectly legitimate question! However, the point is, there's no principle that says the probabilities have to be equal across branches. That's like saying every day must have a 50% chance of rain because it can either be rainy or not.
Sequential projections can rotate a state. There is nothing non-causal about this, since there is no way of measuring (projecting) without interfering with the state. The third measurement involves a fundamentally different state from the first measurement, in a way which is not captured by your pebble-example.
This is not related to MWI in any particular way. MWI is about how we interpret the probabilities involved, but has the same experimental predictions as e.g. Copenhagen interpretation.
I think your example is well captured by the sequential Stern Gerlach experiment. In short, you take a stream of electrons and divide them into spin up and spin down as measured along the z-axis. You discard the spin down electrons. You then take the resulting stream and divide them into spin up and spin down as measured along the x-axis. You discard the spin down electrons.
If you then again divide the electrons into spin up and spin down as measured along the z-axis, you again find spin down electrons! Despite them being filtered away in the first filter, they have returned due to your x-filter in the middle.
Best Answer
Without entering the quantum mechanics of the situation, we can see that each toss is a new world. The next toss is another world, so the series of heads do not add in the way you think to make a world of all heads.
Each world deserted by each new toss will have the usual probabilities of heads or tails.
A world of all heads is possible with sequential tossings making a history of all heads, but not in the way you think:
The "always get heads" assumes that you have freedom to keep tossing in the same world. You can only "always have gotten heads" in one world line.
The many worlds interpretation is just mathematics made visual, in my opinion.
Of course to even register that such a world line exists innumerable numbers of worlds will have been created so as to have the history in your world line that such a world existed ! Thinking mathematically is much simpler.