You should really think about the variables we use as being like coordinates on some manifold, the configuration space (roughly the same as the phase space, I won't be careful about the distinction). In this language, changing variables is equivalent to changing coordinates on this manifold. The action is some scalar function on this space, and we can take coordinate derivatives, $\frac{\delta S}{\delta q_a}$ with respect to whatever coordinates $q_a$ we are using on the space. As in multivariable calculus, we can form the directional derivatives $D_v S = v_a\frac{\delta S}{\delta q_a}$ for any vector with components $v_a$ we like. If you want something more formal and geometric, the directional derivative is a Lie derivative on the configuration manifold.
Now, remember that when we vary the action, we demand that the variation be stationary. That is, we demand that all directional derivatives vanish, meaning $D_v S=0$ for all vectors $v$. This statement about the vanishing of directional derivatives, you'll note, it entirely agnostic to the coordinates we use, but nonetheless implies that if we use coordinates $q_a$, that $\frac{\delta S}{\delta q_a}=0$ for each $a$. But any coordinate system would result in the same condition that all the coordinate derivatives vanish. This should also be in some sense familiar from multivariable calculus.
So this is casting the invariance of the Euler-Lagrange equations in a geometric language. Aside from being nice, this is also going to be the right language to understand what's going on in the Hamiltonian picture.
The phase space is normally coordinatized by pairs of coordinates $(p^a,q_a)$, but this is really not necessary. At the end of the day, the phase space is again a manifold and the $(p,q)$ are simply a special coordinate system on that manifold (Darboux's theorem implies that we can always, at least locally, find such a coordinate system). The thing that defines these special coordinates, really, is that the symplectic form takes a very special form.
In case you are less than familiar with symplectic forms, let me do the following to motivate the idea. Instead of using the coordinates $(p^a,q_a)$, instead use a collective coordinate $\xi^a = \langle p^a, q_a\rangle$, so all I've really done is put the $p$'s and $q$'s into one big vector. Just to be clear, if $q_a$ and $p^a$ were $n$-dimensional vectors, then $\xi^a$ is a $2n$-dimensional vector formed by concatinating the components (well, any way of putting the components together will do...this will just change the precise form of the $\Omega$ I introduce in a moment by permuting it's rows and columns appropriately). In terms of this, the Hamilton's equations may now be written
$$
\frac{d\xi^a}{d t}=\Omega^{ab}\frac{\partial H}{\partial \xi^b}
$$
where $\Omega$ is some matrix. Usually, this looks something like
$$
\Omega=\left(\begin{array}{cc}
0 & -1\\
1 & 0\end{array}\right).
$$
This $\Omega$ is typically known as the inverse of the symplectic form. Though sometimes you'll just hear it called the symplectic form (an abuse of language, but not uncommon) or, more accurately, a Poisson bivector. These names are not so important to what I want to say, but I figure I may as well mention the correct terminology for anyone who wants to try searching online themselves.
Now, the symplectic form does, in fact, transform under coordinate changes just like you might expect any tensorial object over a manifold to do. If we take on faith that the symplectic form should transform as a tensor under coordinate changes, then we already know how the right-hand side of the rewritten Hamilton's equation transforms if we were to transform to some other coordinate system. Let's investigate the left-hand side.
Suppose we perform some transformation $\xi^a=\xi^a(\zeta)$ to a new coordinate system $\zeta$. Then by chain rule,
$$
\frac{d \xi^a}{d t}=\frac{\partial \xi^a}{\partial \zeta ^\mu}\frac{d\zeta^\mu}{d t},
$$
so we see, perhaps unsurprisingly the time derivative also transforms as a tensor under a coordinate change (I used $\mu$ for the indices of the new coordinates just to keep things visually distinct).
So in the end, we find that Hamilton's equations transform as
$$
\frac{d \xi^a}{d t}=\Omega^{ab}\frac{\partial H}{\partial \xi^b}\implies \frac{\partial \xi^a}{\partial \zeta ^\mu}\frac{d\zeta^\mu}{d t} = \Omega^{ab}\frac{\partial \zeta^\nu}{\partial \xi^b}\frac{\partial H}{\partial \zeta^\nu},
$$
which if we move the Jacobian on the left to the right, we find
$$
\frac{d\zeta^\mu}{dt}=\left(\frac{\partial\zeta^\mu}{\partial \xi^a}\Omega^{ab}\frac{\partial\zeta^\nu}{\partial\xi^b}\right)\frac{\partial H}{\partial \zeta^\nu}
$$
and hence we see that if we define a new symplectic form $\Omega^\prime$ by
$$
\Omega^{\prime\, \mu\nu}=\frac{\partial\zeta^\mu}{\partial \xi^a}\Omega^{ab}\frac{\partial\zeta^\nu}{\partial\xi^b},
$$
(which is equivalent to my assertion that $\Omega$ should be tensorial under coordinate changes) Hamilton's equation actually are invariant in the sense that we still have equations of the form
$$
\frac{d \zeta^\mu}{dt}=\Omega^{\prime\, \mu\nu}\frac{\partial H}{\partial\zeta^\nu}.
$$
The only difference is that the components of $\Omega$ and $\Omega^\prime$ might not be the same. But really this shouldn't be such a shock after all this setup since we wouldn't expect the components of a tensor to remain the same after a coordinate change.
Consider as an example the Minkowski metric. We know what this looks like in Cartesian coordinates. If we changed to polar coordinates, for example, of course the component entries in the metric change, but it's still, in a very real sense, the same metric. We just have a new representation thereof.
So where to canonical transformations fit into all this? They are simply the very special coordinate transformations which actually leave the components of the symplectic form invariant. Formally, these are coordinate transformations generated by vector fields over phase space whose Lie derivative of the symplectic form vanishes. This is very similar in many respects to a vector field being a Killing vector of some metric.
Finally, I should point out that the way I have framed the entire discussion above, it may seem strange why we should consider canonical transformations at all. After all, we can use any transformation at the cost of a nice form for the symplectic form. Perhaps non-canonical transforms can put the equations in a nice form.
In principle this is of course true. However canonical transformations play a very essential role which is intimately tied to Noether's theorem and symmetry. Essentially one can guarantee that every symmetry of the action corresponds to precisely a canonical transformation. Furthermore, only vector fields which correspond to canonical transformations are guaranteed to have a charge associated to them (like the Hamiltonian is the charge associated to time evolution (which is itself a canonical transformation)).
Best Answer
Here is how I understand it: let's say you have some coordinates $p_i$ and some momenta $q_i$. You want to find transformations
$$Q_i\equiv Q_i(q_i,\,...,\,q_n,\,p_i,...,\,p_n,\,t)\qquad P_i\equiv P_i(q_i,\,...,\,q_n,\,p_i,...,\,p_n,\,t) $$
These variables $p_i$, $q_i$, $Q_i$, and $P_i$ must satisfy the Hamilton's equations of motion:
$$\dot{q}_i=\frac{\partial H}{dp_i},\quad \dot{p}_i=-\frac{\partial H}{dq_i},\quad \dot{Q}_i=\frac{\partial K}{dP_i},\quad \dot{P_i} = -\frac{\partial K}{dQ_i}$$
where $K$ is the transformed Hamiltonian $K\equiv K(Q,\,P,\,t)$.
To perform the canonical transformation, we may be able to introduce the function $F=F_1$, where
$$P_i \dot{Q}_i-K+\frac{dF_1}{dt}=p_i \dot{q}_i-H$$
or we could introduce $F=F_2-Q_iP_i$, and we thus have
$$-\dot{P}_i Q_i-K+\frac{dF_2}{dt}=p_i \dot{q}_i-H $$
Similarly for the generating functions for $F_3$ and $F_4$.
Now that we understand their differences, we have to ask how we use them. What you use is based upon what you know. If I want to find $F_1$, I integrate $p_i$ with respect to $q_i$ and $-P_i$ with respect to $Q_i$, and combine the result to find the whole value for $F_1$. A similar process follows for the identities for $F_2$, $F_3$, and $F_4$. Now, if I have the opposite problem--namely, I have $F_1$, and I want to find the coordinates and/or momenta--then I take the partial with respect to $q_i$ to find $p_i$ and the partial with respect to $Q_i$, take the negative, and I find $P_i$.
If I understand your question correctly, you have a set of $Q$'s and $P$'s, and you want to find $F$. You can pick whatever $F$ you want--you just need to change the definition of the generating function as given above.
Hope this helps. These slides might also be of assistance: http://users.physics.harvard.edu/~morii/phys151/lectures/Lecture20.pdf