Let $M$ be your spacetime, a smooth manifold equipped with (pseudo) Riemannian metric (for example $\mathbb{R}^{(1,3)}$ for special relativity).
The set of reference frames is the frame bundle over $M$, usually denoted $FM$. Explicitly a frame at point $p$ in $M$ can be viewed as an ordered orthonormal basis (with respect to the the inner product defined by the metric) for the tangent space at $p$, $T_pM$.
For example, in metrics with Lorentzian signature in dimension 4, these frames are related by rotations in $\mathbb{R}^{(1,3)}$, aka Lorentz transformations, as expected.
You should really think about the variables we use as being like coordinates on some manifold, the configuration space (roughly the same as the phase space, I won't be careful about the distinction). In this language, changing variables is equivalent to changing coordinates on this manifold. The action is some scalar function on this space, and we can take coordinate derivatives, $\frac{\delta S}{\delta q_a}$ with respect to whatever coordinates $q_a$ we are using on the space. As in multivariable calculus, we can form the directional derivatives $D_v S = v_a\frac{\delta S}{\delta q_a}$ for any vector with components $v_a$ we like. If you want something more formal and geometric, the directional derivative is a Lie derivative on the configuration manifold.
Now, remember that when we vary the action, we demand that the variation be stationary. That is, we demand that all directional derivatives vanish, meaning $D_v S=0$ for all vectors $v$. This statement about the vanishing of directional derivatives, you'll note, it entirely agnostic to the coordinates we use, but nonetheless implies that if we use coordinates $q_a$, that $\frac{\delta S}{\delta q_a}=0$ for each $a$. But any coordinate system would result in the same condition that all the coordinate derivatives vanish. This should also be in some sense familiar from multivariable calculus.
So this is casting the invariance of the Euler-Lagrange equations in a geometric language. Aside from being nice, this is also going to be the right language to understand what's going on in the Hamiltonian picture.
The phase space is normally coordinatized by pairs of coordinates $(p^a,q_a)$, but this is really not necessary. At the end of the day, the phase space is again a manifold and the $(p,q)$ are simply a special coordinate system on that manifold (Darboux's theorem implies that we can always, at least locally, find such a coordinate system). The thing that defines these special coordinates, really, is that the symplectic form takes a very special form.
In case you are less than familiar with symplectic forms, let me do the following to motivate the idea. Instead of using the coordinates $(p^a,q_a)$, instead use a collective coordinate $\xi^a = \langle p^a, q_a\rangle$, so all I've really done is put the $p$'s and $q$'s into one big vector. Just to be clear, if $q_a$ and $p^a$ were $n$-dimensional vectors, then $\xi^a$ is a $2n$-dimensional vector formed by concatinating the components (well, any way of putting the components together will do...this will just change the precise form of the $\Omega$ I introduce in a moment by permuting it's rows and columns appropriately). In terms of this, the Hamilton's equations may now be written
$$
\frac{d\xi^a}{d t}=\Omega^{ab}\frac{\partial H}{\partial \xi^b}
$$
where $\Omega$ is some matrix. Usually, this looks something like
$$
\Omega=\left(\begin{array}{cc}
0 & -1\\
1 & 0\end{array}\right).
$$
This $\Omega$ is typically known as the inverse of the symplectic form. Though sometimes you'll just hear it called the symplectic form (an abuse of language, but not uncommon) or, more accurately, a Poisson bivector. These names are not so important to what I want to say, but I figure I may as well mention the correct terminology for anyone who wants to try searching online themselves.
Now, the symplectic form does, in fact, transform under coordinate changes just like you might expect any tensorial object over a manifold to do. If we take on faith that the symplectic form should transform as a tensor under coordinate changes, then we already know how the right-hand side of the rewritten Hamilton's equation transforms if we were to transform to some other coordinate system. Let's investigate the left-hand side.
Suppose we perform some transformation $\xi^a=\xi^a(\zeta)$ to a new coordinate system $\zeta$. Then by chain rule,
$$
\frac{d \xi^a}{d t}=\frac{\partial \xi^a}{\partial \zeta ^\mu}\frac{d\zeta^\mu}{d t},
$$
so we see, perhaps unsurprisingly the time derivative also transforms as a tensor under a coordinate change (I used $\mu$ for the indices of the new coordinates just to keep things visually distinct).
So in the end, we find that Hamilton's equations transform as
$$
\frac{d \xi^a}{d t}=\Omega^{ab}\frac{\partial H}{\partial \xi^b}\implies \frac{\partial \xi^a}{\partial \zeta ^\mu}\frac{d\zeta^\mu}{d t} = \Omega^{ab}\frac{\partial \zeta^\nu}{\partial \xi^b}\frac{\partial H}{\partial \zeta^\nu},
$$
which if we move the Jacobian on the left to the right, we find
$$
\frac{d\zeta^\mu}{dt}=\left(\frac{\partial\zeta^\mu}{\partial \xi^a}\Omega^{ab}\frac{\partial\zeta^\nu}{\partial\xi^b}\right)\frac{\partial H}{\partial \zeta^\nu}
$$
and hence we see that if we define a new symplectic form $\Omega^\prime$ by
$$
\Omega^{\prime\, \mu\nu}=\frac{\partial\zeta^\mu}{\partial \xi^a}\Omega^{ab}\frac{\partial\zeta^\nu}{\partial\xi^b},
$$
(which is equivalent to my assertion that $\Omega$ should be tensorial under coordinate changes) Hamilton's equation actually are invariant in the sense that we still have equations of the form
$$
\frac{d \zeta^\mu}{dt}=\Omega^{\prime\, \mu\nu}\frac{\partial H}{\partial\zeta^\nu}.
$$
The only difference is that the components of $\Omega$ and $\Omega^\prime$ might not be the same. But really this shouldn't be such a shock after all this setup since we wouldn't expect the components of a tensor to remain the same after a coordinate change.
Consider as an example the Minkowski metric. We know what this looks like in Cartesian coordinates. If we changed to polar coordinates, for example, of course the component entries in the metric change, but it's still, in a very real sense, the same metric. We just have a new representation thereof.
So where to canonical transformations fit into all this? They are simply the very special coordinate transformations which actually leave the components of the symplectic form invariant. Formally, these are coordinate transformations generated by vector fields over phase space whose Lie derivative of the symplectic form vanishes. This is very similar in many respects to a vector field being a Killing vector of some metric.
Finally, I should point out that the way I have framed the entire discussion above, it may seem strange why we should consider canonical transformations at all. After all, we can use any transformation at the cost of a nice form for the symplectic form. Perhaps non-canonical transforms can put the equations in a nice form.
In principle this is of course true. However canonical transformations play a very essential role which is intimately tied to Noether's theorem and symmetry. Essentially one can guarantee that every symmetry of the action corresponds to precisely a canonical transformation. Furthermore, only vector fields which correspond to canonical transformations are guaranteed to have a charge associated to them (like the Hamiltonian is the charge associated to time evolution (which is itself a canonical transformation)).
Best Answer
Because our world is (locally) Euclidean.
In an Euclidean world straight lines are favored and we subconsciously defined things like distance, area and volume using straight lines. If the real object we are studding is not made of straight lines, we cut it into small pieces, until we get approximately straight lines and then we sum up these small pieces to get the “real” value.
In an Euclidean world there is a direct correlation between Cartesian coordinates and lengths in the real world. For example, a circle in Cartesian coordinates is a circle in the real world; but a circle in the real world is a rectangle in polar coordinates.
Consider a surface bounded by the following curves in our world: $y=a_0 /x$, $y=a_1 /x$, $y=b_0 x$, $y=b_1 x$
In a world where their vertical lines correspond to a curve defined as $u=xy$ in ours and horizontal lines correspond to the curves $v=y/x$, this complex figure becomes simply a square with sides $a_1-a_0$ and $b_1-b_0$ and area $A=(a_1-a_0)(b_1-b_0) \, u.a$.
In our world it is not that simple and the area bounded by those lines is definitely not that. Now the Jacobian is the tool we use to convert the value of a measurement from one coordinate system to the value that would be obtained if the measurement were performed in Cartesian coordinates. It represents the infinitesimal relation between lengths of an object when drawing in one system to the other. Infinitesimal lengths can always be considered straight lines.
It is important to clarify that Cartesian coordinates is not the only “straight” coordinate system. Any system formed by non orthogonal straight lines can be considered. The thing is that we would consider those systems as sheered and then not necessarily natural.
PS: The Jacobian is always there. We just define $|J|=1$ for a Cartesian coordinate system because the results from this system have direct relation with our world. Some alien species living in a world with a very strong gravitational field (next to a black hole for example ;-), would have defined their $|J|=1$ to a different system.
EDIT:
Using $$I_c= \int \int dx \, dy$$ As the double integral in Cartesian coordinates, and $$I_p= \int \int dϱ \, dθ$$ As the double integral in polar coordinates.
Because of the correspondence of Cartesian coordinates and the “real world”, one could notice, either experimentally or by logic (using methods like squaring the circle and cutting the circle) that $I_c$ gives the same face value as the area bounded by the limiting curves. Therefore, one can define the area as $$A= \int \int \, dA=\int \int dx \, dy$$ And the element of area $$dA=dx \, dy$$ However, $I_p$ gives a different value (in this case the perimeter of the circle). From this point onwards, 2 things can be done;
1. Define $│J│=1$ in the starting system
With this definition, applying the Jacobian when passing from the original system to the next will allow you to get the same numeric value, but double integrals will have all sort of meaning, depending on the meaning it has on the starting system, making it a little ambiguous at minimum.
2. Define $│J│=1$ in the Cartesian system
With this definition, all systems will have not only the same numeric value, but all double integrals will result in area of the surface bounded by the curves.
The same for volume integrals.