The Many Worlds interpretation is popularly misunderstood. The wave function itself contains a spectrum of universes, one corresponding to each eigenvalue for a given operator. The "splitting" of the "many worlds" is represented by the time evolution of the wave function described by the Schrodinger equation. As Lubos mentions above, these "universes" only become separate through decoherence.
Consider, for example, a wave function in the position-basis given by a delta-function at x=0. This represents one universe. Now time-evolve the wave function using the schrodinger equation. The delta-function has now spread-out a bit. It is peaked at x=0, but has non-zero values at x=+1 and x=-1. This represents the existence of universes in which the position of the particle is at x=0, x=+1, and x=-1. In some sense there are "more" universes at x=0 than at x=+-1, because the wave function is more highly peaked at x=0. This is where some of the difficulty in the Many Worlds interpretation comes in: what ontology to use to describe the "splitting", "how many universes" are at x=0 vs x=+-1, and so on. The main point I want to make is that the "splitting" is just an interpretation of what is happening with the evolution of the wave function according to the schrodinger equation. Nothing "more" is actually happening. You model the "splitting" using the tried-and-true schrodinger evolution of the wave function.
Binary branching is just a simplification to make it easier to explain without math. The actual math is very simple, and can handle unequal probabilities.
At the simplest level, a branching occurs when you can write the wavefunction as a sum
$$|\psi \rangle = |\psi_1 \rangle + |\psi_2 \rangle$$
where $|\psi_1 \rangle$ and $|\psi_2 \rangle$ are orthogonal and decohered, i.e. that there is no reasonable physical process that can make them overlap again. In this case we colloquially describe the two terms as "worlds" or "branches", and the probability of being in each one is the norm $\langle \psi_i | \psi_i \rangle$, which can be an arbitrary number between zero and one. The same logic goes for branching into more than two "worlds" at once, and repeated branching: you just get a sum of many terms, and the probability of each one is its norm.
After some comments, I get the feeling you really want a discussion of where the probability in the many worlds interpretation "comes from". Again, this is a very subjective and debatable thing, but my favorite take on it is "self-locating uncertainty".
Suppose that somebody kidnaps you, blindfolds you, and takes you somewhere in Uzbekistan. When you come to your senses, are you closer to Samarkand than Tashkent? You don't know for sure, so you can only answer in terms of probabilities. This is self-locating uncertainty: you're certainly in a definite place, and it's not like there are many copies of you running around, but there's probability nonetheless. You can use a variety of information to help. For example, if you weight by area, about 85% of the country is closer to Samarkand. (But this doesn't mean there are $85$ copies of you near Samarkand and $15$ copies of you near Tashkent!) But if you weight by population, substantially more of the population is closer to Tashkent, because it's the capital. Of course, which weighting is the correct choice depends on how the kidnappers set things up.
Now, suppose that after the spin of a particle is measured by a device, the state is
$$|\psi \rangle = \sqrt{0.85} |\text{spin up measured} \rangle + \sqrt{0.15} |\text{spin down measured} \rangle.$$
You are living in one and only one branch of the wavefunction, but until you look at what the device is reading, you don't know which. At best, you can assign probabilities. The core assumption of many worlds is that the correct choice of probability (i.e. the choice that corresponds to what you actually observe, when averaged over many measurements) is to take the coefficient of each branch and take its norm squared, i.e. to assign an 85% chance to observing spin up.
If you ask where this assumption comes from, it's a perfectly legitimate question! However, the point is, there's no principle that says the probabilities have to be equal across branches. That's like saying every day must have a 50% chance of rain because it can either be rainy or not.
Best Answer
A simpler example than putting humans inside your equations might be more clear. Imagine a single electron with some spin.
It is going to enter a Stern-Gerlach device as a beam going in the positive y direction and the Stern-Gerlach device will deflect a spin up beam entirely left. And the Stern-Gerlach device will deflect a spin down beam entirely right. These directions, y , left, and right were all details that were determined by the initial setup of the beam and the device.
But the device does this not because the Stern-Gerlach device is magical, it does so because the Stern-Gerlach device has an inhomogeneous magnetic field and there is a particular Hamiltonian for magnetic fields and spin and the Hamiltonian dictates how things evolve. So anyone that agrees about how the Schrödinger equation evolves states agrees this happens. Copenhagen, Many-Worlds Interpretation (which was not called Many Worlds by its creator and I think its creator did not mention the words many worlds), Decoherence, Ithaca, dBB, etcetera.
So the Schrödinger equation has a wave that is a beam spread in the x and z direction and with an initial probability current pointing in the y direction for an initial condition of the wave function. And it evolves according to the Schrödinger equation. Now, what if the spin of the particle is a superposition of spin up and spin down? Again, we can just mathematically change the initial wavefunction and then mathematically evolve it according to the Schrödinger equation for the particular Hamiltonian that accurately describes the actual inhomogeneous magnetic field in the exact actual Stern-Gerlach device and then again every theory that agrees with the Schrödinger equation can evolve it and they all predict the same thing. They see two beams coming out, one with spin polarized as spin up, one with spin polarized as spin down.
And again, any theory that uses the Schrödinger equation can track the so-called probability current and see streamlines in the initial beam, some of which end up in that left deflected beam, some which end up in that right deflected beam. And they can see the beam, as a whole, mathematically split into two like a stream that forks when there is a hill in the way. They see that there is more beam deflected left or right (in the L2 norm sense) depending on how big a superposition you had of spin up versus spin down in the initial wave.
And the Schrödinger equation when used for the actual experimental setup also predicts that the spin of the particle continuously evolves over time so that eventually the left beam has a spin that is up and the spin of the right beam has a spin that is down.
Nobody disagrees that this is exactly what the Schrödinger equation predicts for the actual experimental setup. So the equation that makes the actual predictions predicts that one beam with spin not aligned purely in the up or down direction becomes (continuously over time) two beams each of which has a spin that is aligned in the up or down direction.
Effectively the Stern-Gerlach device has co-evolved the spin and the position so the two are now entangled you have (wave on the left and spin up) added to (wave on the right and spin down) and they are orthogonal, orthogonal for two reasons now, whereas when it was incoming beam with spin of (superposition of up and down) they were orthogonal for only one reason, the spin, which could have been written as a super position of many different orthogonal states. Now that the interaction has separated the beam into two spatially separate beams, future things that interact based on where the beam is can by surrogacy be correlated with the spin.
Great. And everyone that uses the Schrödinger equation and is also willing to bother to use it on the actual experimental setup (a rarely utilized option) agrees this is what happens.
Now it might help to reveal a big fact that isn't mentioned much, and again it is just about the Schrödinger equation. If you have two particles or more, then the wave function is not a wave like the electric field or the magnetic field, it is not defined in actual space. It is defined on configuration space, a space that has a different x for every particle a different y for every particle and a different z for every particle. So for two particles your function is $\Psi=\Psi(x_1,y_1,z_1,x_2,y_2,z_2,t)$ so for every point in the 6d configuration space you are specifying a full configuration, you are saying where each and every particle is.
So this space is huge. If two beams go off in slightly different directions and bounce off separate things and go into different directions and possibly different and changing speeds when you have $10^{26}$ or more particles the chance that two beams ever cross each other once they get pretty far away is vanishingly small. This is key.
Because the Schrödinger equation says that if two beams don't overlap and beam one by itself evolves its own way and beam two by itself evolves its own way and those evolved versions never overlapped then the evolution of the sum evolves as the sum of those two separately evolved beams, including the probability current at each point being the probability current in the separate beams.
So the Schrödinger equation predicts that in absolutely every way the two beams evolve as if they were the only beam when they get to a point in time where they will never overlap again. And this happens when devices like Stern-Gerlach devices separate them and then further interactions with lots of particles have the lots of other particles evolve differently with the two different beams. So far there is no mention of Many Worlds. And any theory that uses the Schrödinger equation for the actual setup will predict this.
So now let's talk about Many Worlds. These beams that never again overlap act like they are the only beam, they act like they are a world unto their own. The math says so. The math everyone uses that uses the Schrödinger equation. So why not just let them them be a World unto their own? Your personal experience is generated by the state of your neurons and such, so the wave function of every particle includes the configurations of your neurons and such. So there is a beam with your neurons in some configurations (the wave is nonzero there) and there is a completely separate non overlapping beam (the beam is defined in configuration space so if just one particle has the beam separated to be non overlapping in the direction for that particle it is like separating a beam in the x direction the whole beam is separated). A separate non overlapping beam for your neurons is a possibly different collection of configurations (or possibly the same, maybe the beam has separated in other places but the dynamics of your neurons haven't changed yet, but the beams as a whole are separate so the math that every single person that uses the Schrödinger equation trusts says the beams won't affect the so called probability-current or the evolution of the other if they never overlap again).
OK, so in the future devices make beams become non overlapping. And if they interact later with large numbers of particles the huge number of particles make it incredibly unlikely they will ever overlap again, so each can and does evolve as if it were the only beam. In Many Worlds you call those worlds (and the creator didn't but the first major popularizer did). But that is just because they then evolve as if they were the only beam in the world. The key is their separateness, a separateness that is already and solely predicted by the Schrödinger equation. That same Schrödinger equation predicts that each of those beams exists and each acts as if it were the only one.
So there is one configuration space, and one wavefunction, the one predicted by the one Schrödinger equation. It's just that it naturally evolves into separated beams that then act independently as if they were the only beam in the world. In Many Worlds you realize that each acts on its own after it separates and realize that any subjective experience is based on your body's configuration so each separately acting beam has a separately acting arrangement of your body, so the different beams include possibly differently arranged yous and each are just as important and valid because we predict them all and each acts like it is the only one that survived the beam splitting and interaction with a large number of particles.
Instead of calling it the Copenhagen interpretation you could call it the Solipsism interpretation (after all Many Worlds wasn't called Many Worlds by its creator) because with a Solipsist's Interpretation, only one of those beams can be real the one with "you" in it.
And a die hard solipsists will sometimes go so far as to throw the baby of science out the window with the bathwater of non-solipsism.
Why? Because the Schrödinger equation predicted all these separate beams. So how can you end up with just one? You'd have to do something other than the Schrödinger equation. And so you'd have to draw a line somewhere and say that some magic happens somewhere and the Schrödinger equation doesn't hold there or then. But if we do an experiment to test that, you are always wrong.
Why? Because by carefully arranging beam reflectors we can get those beams to bounce back and overlap and show that every beam was still there all along, every single one, for any length of time, so was there all along. Always, every single time. So you can just find out how many beam reflectors we can make and how precisely we can arrange them (find out our current technical expertise) and then postulate that raw unobservable (because you specifically designed it to happen where we can't test it) magic happens and that the Schrödinger equation doesn't hold there because it interferes with your self centered egotistical solipsism. Which is really just a kind of sexism and racism and hating of other people and things that is so profound that you just refuse to believe that anything other than you exists (even a body just like yours that was entirely the same up until one day it dynamically evolved the way you would if you interacted with a world where a Stern-Gerlach device deflected a beam differently than the one you personally subjectively experienced) to the point were you make unscience to defend your solipsism.
Why unscience? Because drawing the lines in different places means when our technical expertise advances we get an infinite number of different predictions. So Copenhagen can't even make predictions. It, Copenhagen, isn't even science when it wants to have magical collapses to defend solipsism. And there is no reason to hold one of the splittings of the beam as special just because you experienced it.
Let's get this straight, there is nothing wrong about treating those other worlds as if they don't exist because they don't affect your world. But to claim that something happened to them just because you want to feel special and to destroy science itself and the ability to make predictions just because you want to have your individual and personal subjective experience be the center of the universe is misguided.
Ignore the other worlds because you can. And be aware of when you can and don't do it too early or in situations where it doesn't apply. Pretending they aren't there is a bit dishonest and can lead to wrongness.