It all boils down to how much information there is to, in principle, distinguish which way the photon went from the final state of the beam splitter, as encoded in the overlap between its two possible final states. The interference is destroyed because the photon gets entangled with the beam splitter, and the amount of entanglement depends on this overlap.
Say, then that if the photon goes straight through the beam splitter, to state $|{\to}\rangle$, the beam splitter stays put, at state $|0\rangle$, whereas if the photon gets deflected into state $|{\downarrow}\rangle$, the beam splitter gets some upwards momentum, $|{\Uparrow}\rangle$. If the result is a superposition, then, the total state of the system is entangled:
$$|\Psi\rangle=|\to\rangle|0\rangle + |{\downarrow}\rangle|{\Uparrow}\rangle .$$
Regardless of what you do to the beam splitter - i.e. measure its state or just forget about it - in the absence of a measurement that introduces further interactions, the information you have available to produce an interference pattern on the photon side is given by the reduced density matrix obtained by taking the partial trace over the beam splitter.
Calculating this object is fairly simple. In the $\{|{\to}\rangle, |{\downarrow}\rangle\}$ basis, it is given by
$$
\rm{Tr}_{\rm{BS}}(|\Psi\rangle\langle\Psi|)
=
\begin{pmatrix}
1 & \langle0|{\Uparrow}⟩ \\ \langle{\Uparrow}|0⟩&1
\end{pmatrix}.
$$
If the beam splitter states are completely distinguishable, then they are orthogonal and what you get on the photon side is a completely mixed state, $|{\to}⟩⟨{\to}|+|{\downarrow}⟩⟨{\downarrow}|$, which is completely classical, and from which no interference can be extracted. Note that this happens regardless of whether you actually measure the beam splitter's momentum or not.
If there is no effect on the beam splitter, on the other hand, the states are the same, and the photon's density matrix corresponds to a pure state, $\left(|{\to}⟩+|{\downarrow}⟩\right)\left(⟨{\to}|+⟨{\downarrow}|\right)$. Then you will see complete interference, but you will have no "which way" information available, even in principle.
In any physical realization, of course, you're somewhere in the middle. Most realizations have very similar states for the beam splitters, which means that $\langle{\Uparrow}|0\rangle$ is very close to 1, and you get good interference, but as the states become more distinguishable, the contrast in the interference fringes is reduced.
I understand this can feel pretty thin. After all, how are we to know that we've eliminated all possible places where "which way" information may in principle be available? This is in fact how it goes down in the lab, and that's the reason observing things like Mandel dips is very, very touchy: if you want two photons to interfere, you need to make sure that they truly are indistinguishable - in spatial profile, displacement, spectrum, and timing - for otherwise there will be (possibly undetected) entanglement with some other mode, and that will reduce or destroy your interference contrast.
Best Answer
"Why does the photon always appear at the same detector?"
I had the same question when I read about it in Penrose's The Road To Reality, but I found in https://arxiv.org/pdf/quant-ph/9610033.pdf that the apparatus must be carefully tuned to get the destructive interference.
Apparently, Mach-Zehnder interferometers are used in other applications where this is not the case.