One of the reasons why the original construction of Hamiltonian Monte Carlo can be tricky to understand is that it is more restrictive than necessary, if only to simplify the theoretical proofs. In particular, the negation of the momenta in the deterministic update is indeed practically irrelevant because of the full momenta resampling*. If we include it however, then we can exploit the theoretical fact that the composition of two symmetric proposals is itself symmetric to show that the composite Hamiltonian Monte Carlo update is also symmetric.
The deterministic proposal distribution is a bit tricky because, as noted in the question, it is given by a delta function. Tierney (https://projecteuclid.org/euclid.aoap/1027961031) showed that for any such deterministic update to be symmetric it has to be an involution. In other words it has to yield the identify when applied twice. Assuming a symplectic integrator, which is time-reversible and volume preserving, this is guaranteed by the momentum negation.
Hamiltonian Monte Carlo is actually much more general, both in terms of theory and implementation, than the original algorithm. For more details see https://arxiv.org/abs/1701.02434, especially the appendix.
* There are partial momenta resampling schemes, from the original schemes of Horowitz to more recent "Extra Chance" or "Generalized" schemes, where the negation is of critical importance to the validity of the algorithm.
The deterministic Hamiltonian trajectories are useful only because they are consistent with the target distribution. In particular, trajectories with a typical energy project onto regions of high probability of the target distribution. If we could integrate Hamilton's equations exactly and construct explicit Hamiltonian trajectories then we would have a complete algorithm already and not need any acceptance step.
Unfortunately outside of a few very simple examples we can't integrate Hamilton's equations exactly. That's why we have to bring in symplectic integrators. Symplectic integrators are used to construct high accuracy numerical approximations to the exact Hamiltonian trajectories that we can't solve analytically. The small error inherent in symplectic integrators causes these numerical trajectories to deviate away from the true trajectories, and hence the projections of the numerical trajectories will deviate away from the typical set of the target distribution. We need to introduce a way to correct for this deviation.
The original implementation of Hamiltonian Monte Carlo considered the final point in a trajectory of fixed length as a proposal, and then applied a Metropolis acceptance procedure to that proposal. If the numerical trajectory had accumulated too much error, and hence deviated too far from the initial energy, then that proposed would be rejected. In other words, the acceptance procedure throws away proposals that end up projecting too far away from the typical set of the target distribution so that the only samples we keep are those that fall into the typical set.
Note that the more modern implementations that I advocate for in the Conceptual paper are not in fact Metropolis-Hastings algorithms. Sampling a random trajectory and then a random point from that random trajectory is a more general way to correct for the numerical error introduced by symplectic integrators. Metropolis-Hastings is just one way to implement this more general algorithm, but slice sampling (as done in NUTS) and multinomial sampling (as currently done in Stan) work just as well if not better. But ultimately the intuition is the same -- we're probabilistically selecting points with small numerical error to ensure exact samples from the target distribution.
Best Answer
As mentioned in the comments by cwl, bjw and Sycorax, the following resources are useful (I can recommend them from my own experience as well):