a) What would happen if we did NOT detect an interference at D0 8ns prior to the entangled idler photons reaching D3 or D4 and decide to remove the BSa and BSb beam splitters really really fast such that the idler photons would travel to either D1 or D2 instead, hence the "which path" is not known and therefore we should actually see an interference pattern at D0.
Just looking at the data at D0 alone you never see an interference pattern. Photons come through the initial double shift at a particular rate. Each time one comes through it experiences a spontaneous parametric down conversion so we have a pair of photons. When one half the pair for to D0 we detect it. Wherever it lands it lands. You have to get many results before you get any pattern at all. So you can't say "no interference" and in particular an interference pattern is really a frequency histogram. If you have two histograms with troughs of one aligned with peaks for the other, the combined aggregate frequency histogram doesn't have peaks and troughs. So you can't wait until you see no pattern and then remove the beam splitters. The pattern comes from labeling each hit at D0 with a time and then later sorting them into groups with peaks of one group on top of troughs of the another group. So the "interference pattern" comes later. Even without the additional beam splitters it comes later because R1 (coincidence with D1) and R2 (coincidence with D2) label the original D0 collection into two distinct groups. Imagine you see a pattern that didn't look like an interference pattern and then 8ns later from each hit you get information to label the individual dots with a happy face or a sad face and you see the happy face distribution develops peaks and troughs and the sad face distribution develops peaks and troughs and the troughs of one are the peaks of the other and vice versa.
Removing the additional beam splitters just means you have two things to sort the results into instead of four. You don't see an interference pattern in the aggregate results at D0, you only see it after you sort the results you see it in the two histograms. And you don't know which group any particular result will be sorted to until 8ns later when you detect at D1 or D2.
b) If we moved either the mirror Mb or Ma just a tiny little bit from it's position, such that the red or blue path for the photon would be a bit different in length, wouldn't then we be able to tell via the time it took to reach either D2 or D1 detectors which of the two slits it came from?
Firstly, there is some leeway, a beam that hasn't been on forever isn't perfectly monochromatic, so there is some room to move the mirrors a bit. Secondly what happens if you move the mirrors is that D1 and D2 will fire at different rates so now you will sort the results at D0 into two unequal groups and now the peaks and troughs of the two subgroups don't line up perfectly and the larger one looks less and less interference pattern shaped until at some magic distance only one detector D1 or D2 goes off (lets say at a certain distance only D1 goes off) and you are now sorting the results of D0 into just one group.
c) If we replaced D0 with another double split, with the red and blue path each pointed at one of the two slits. Would in the case of the idler photon reaching D3/D4, the signal photon choose exactly one of the slits, hence not interfere with itself?
Short answer, you are correct. However the waveform needs to move in multiparticle space. A wave function is not a field in three dimensional space. And this happened even without adding a second double slit. That's why the histograms of R3 and R4 don't have peaks and troughs (well they have one peak each and no trough unless you consider it a trough at infinity that they focus in a finite region). So a second double slit is irrelevant to R3 and R4.
In case you meant to use a second double slit for something else I'll go into more detail about what, if anything, it would do.
A double slit is not magic, and it only works a very particular way in very particular situations. For instance the original double split has a laser wavefront coming into it, so the waves coming out of each slit are in phase with each other. Furthermore the wave is for just one particle and there is no entanglement. Those fact serve to determine exactly where the peaks will be if you placed a screen in front of it. The light heading over to D0 is so very very different than monochromatic in-phase plane-wave laser light. It is entangled light heading towards D0, the parametric down conversion produces entangled light so each of those two red beams coming out of the SPDC region are entangled with each other. It's like there is a superposition of two particles (one traveling along each of those red beams). So each of those pair of beams coming out of the SPDC region is a superposition of states of different polarization. But worse than that they are entangled. So by themselves they don't individually have the properties associated with the entanglement. The red and blue beams could be deflected to have their propagation vectors be orthogonal to a screen with holes and directed towards the holes in the screen. If the holes are large compared to the beam widths it's like not having a screen with holes at all. If the holes are small then D0 will fire less often as the screen absorbs some photons. So you can reproduce those aspects of a standard double slit setup.
But the two beams are not arriving in phase and each is really an entangled superposition of different polarizations. So you can't expect a double slit there to work exactly the same as it would in a normal double slit set up.
Now normally in quantum mechanics you can track the lines of probability current and even make dynamical equations for them. If you do that you see that absorbing the edges of a beam makes the surviving (new) edge share out more.
So the new double slit will flare the beams more. The troughs weren't identically zero since the original slits were finite sized as well as finitely spaced (each beam had sine thickness). But more than this size and wavelength away from the central region of D0 the new double slit is now more spread out so you should detect more peaks and troughs. They happen because of the difference in part length from red and blue. There are multiple subgroups with different locations for peaks and troughs because the red and blue beams don't have a constant phase difference because of the entanglement with other beams.
Therefore, in case of the signal photon hitting an area of the screen it could not possibly hit when interfering with itself (gaps on a interference pattern), we would know for certain that the which path is known 8ns beforehand with just a single photon pair (signal/idler), in this special case?
The frequency histogram at D0 is the sum of the histograms for R1, R2, R3, and R4. And R3 and R4 have one central peak each, offset from each other since the red and blue aim for different places. And R1 and R2 have peaks in the other ones troughs and vice versa.
When you see a hit in the trough of R1 you now know it is much more likely that D1 does not go off 8ns later.
Let me preface by saying that "coupling" is a favorite physicist word that is perhaps best described linguistically than rigorously; it's deployed in a few different situations.
In general, we say that a coupling exists in quantum mechanics if the evolution of one part of the system depends on another quantity, which could be either classical or quantum. I'll give one example for each.
Suppose the Hamiltonian of a two-level system is an internal Hamiltonian $H_\mathrm{int}$ and an additional part that depends on some external parameter, maybe $\theta$:
\begin{equation}
H = H_\mathrm{int} + \theta \sigma_z
\end{equation}
Here the $\sigma_z$ was arbitrary. The point is that this system's evolution depends directly on the parameter $\theta$--maybe it's an external magnetic field, or some other feature of the environment. In this case, we would generally say that the system is "coupled to $\theta$." (You'd often see this in a metrological context, where we might be interested in using a quantum system coupled to an external parameter to measure the parameter.) In this case, there is only one quantum object, evolving under $H$.
Another common system--maybe a little more general--would be the evolution of two different variables both treated quantum mechanically. The idea here is that there would be some operator $A$ characterizing one observable of interest, and another $B$ characterizing a second. Then the Hamiltonian might be:
\begin{equation}
H = H_A + H_B + H_{AB}
\end{equation}
Where $H_A$ doesn't contain any term depending on $B$, $H_B$ doesn't contain any term depending on $A$, and $H_{AB}$ might have terms like $A \cdot B$, $A^2 B$, etc. The reason why this couples the system is that if we now evaluate Heisenberg equations of motion $\dot{A} = \frac{i}{\hbar} \left[ H, A \right]$ we'll find that the $H_{AB}$ term will put terms depending on $B$ into $\dot{A}$ and vice versa. Therefore, solving the equations of motion will require describing both $A$ and $B$. On the other hand, if the equations "decouple" or we do something to decouple them ourselves, we can usually find a solution for $A(t)$ that doesn't depend on $B$ and vice versa.
This is all paralleled in classical mechanics, by the way, where we would call two variables coupled if they appeared in each others' equations of motion.
EDIT: Peter Shor points out that objects can be "indirectly coupled," which is correct but would usually require me to introduce another variable $C$. I think the most general statement of being coupled/uncoupled is asking whether the equations of motion can be solved independently of each other.
Best Answer
Spontaneous parametric down-conversion converts a single incoming photon to two outgoing photons. I think the article is saying that that if you measure one photon coming out there must be a second photon as well. The author is referring to the second as a heralded photon in the sense that measurement of the first photon is a sign that the second (heralded) photon is going to be emitted.