Maybe an example:
A particle moving in 2 dimensions has a Lagrangian
$$L = \frac{\dot{x}^2 +\dot{y}^2}{2} $$
So $$p_x = \frac{\partial L}{\partial \dot{x}} = \dot{x}$$ $$p_y = \frac{\partial L}{\partial \dot{y}}=\dot{y}$$
Suppose it's constrained to move on a circle $x^2+y^2=R^2$
Now there is a constraint between the p's which you can get from differentiating the constraining circle, namely $$x\dot{x}+y\dot{y}=0$$ This is a constraint, but not of the type you are talking about, since the Lagrangian is still regular.
To obtain a Lagrangian which is singular rather than regular, we require c onstraints which result in the vanishing of the Hessian matrix $\frac{\partial^2L}{\partial \dot{q}_i \partial \dot{q}_j}$. This means that the Legendre transform (sometimes called the Floer map) from the tangent bundle to the cotangent bundle (phase space) $$\mathcal{FL} : TQ \rightarrow T^{*}Q$$ given by $$(q_i,\dot{q}_i) \rightarrow \left(q_i, p_i=\frac{\partial L}{\partial \dot{q}_i}\right)$$
is not invertible. It's image is restricted by a bunch of constraint functions. (Caveat, assuming we're restricted to a neighbourhood where rank of Hessian is constant).
For example, for the following Lagrangian $$L=\frac{1}{2}(\dot{x}^2+\dot{y}^2)+\dot{x}\dot{y}+4x\dot{y}+2x^2+4xy$$
the Hessian determinant is easily seen to vanish. The generalized momenta are $$p_x=\dot{x}+\dot{y}$$ $$p_y=\dot{x}+\dot{y}+4x$$ You can then eliminate $\dot{x}$ and $\dot{y}$ from these relations to find your constraint equation.
(Edited to provide example appropriate to the OP's question)
A canonical transformation $(q^i,p_j) \to (Q^i,P_j)$ preserves the form of Hamilton's equations.
Similarly, a symplectic transformation$^1$ $(q^i,p_j) \to (Q^i,P_j)$ preserves the Poisson structure, aka. as a symplectomorphism. In other words, all the fundamental Poisson brackets (PB)
$$ \{ q^i,p_j \} ~=~ \delta^i_j, \qquad \{q^i,q^j \}~=~0, \qquad \{ p_i,p_j \} ~=~ 0,\qquad i,j \in\{1, \ldots, n\},$$
have the same form in the new coordinates
$$ \{ Q^i,P_j \} ~=~ \delta^i_j, \qquad \{Q^i,Q^j \}~=~0, \qquad \{ P_i,P_j \} ~=~ 0,\qquad i,j \in\{1, \ldots, n\}. $$
In particular, to answer OP's question(v2), the relations $\{Q^i,Q^j \}=0$ and $\{P_i,P_j\} = 0$ are only trivial if $n=1$, because of skewsymmetry of PB.
As is well-known, canonical and symplectic transformations are the same.
For a proof [at least in the case of restricted transformations, i.e. transformations without explicit time dependence], see e.g. Ref.1, which uses so-called symplectic notation. An important point is that the Jacobian matrix of a symplectic transformation must be a symplectic matrix.
References:
- H. Goldstein, Classical Mechanics, Section 9.4 in eds. 3 or Section 9.3 in eds. 2.
--
$^1$ In this answer we will for simplicity only discuss non-degenerate Poisson brackets in finite dimensions using globally defined coordinates.
Best Answer
I) At the classical level, there is no convexity condition. If an action functional $S$ yields a stationary action principle, so will the negative action $-S$. (Under sign changes, a convex function turns in concave function and vice versa.) Or one could imagine a theory, which is convex in one section and concave in other sector.
II) On the Lagrangian side $L(q,v,t)$, it is easy to find counterexample, that shows, that one cannot demand convexity in the the position variables $q^i$; or the time variable $t$, for that matter. (For the former, think e.g. on a Mexican hat potential.) So, as OP writes, the convexity can at most concern the velocity variables $v^i$ in the Lagrangian; or the momentum variables $p_i$ in the Hamiltonian $H(q,p,t)$.
III) In the Hamiltonian formulation, it is possible to perform canonical transformation
$$(q^i,p^j)~\longrightarrow~(Q^i,P^j)~=~(-p^i,q^j)$$
which mixes position and momentum variables. From a Hamiltonian perspective, it is unnatural to impose convexity on half the canonical variables but not the other half.
IV) The Lagrangian (density) may be modified with total divergence terms that don't change the Euler-Lagrange equations. These total divergence terms could in principle violate convexity.
V) The Legendre transformation could be singular. In fact, this is the starting point of constraint dynamics. This happens e.g. for the Maxwell Lagrangian density $${\cal L}~=~-\frac{1}{4}F_{\mu\nu}F^{\mu\nu}.$$ See e.g. this Phys.SE post.
VI) Quantum mechanically, we must demand that the Hamiltonian operator is self-adjoint and bounded from below, i.e. the theory should be unitarity.
Perturbatively, this means that the free/quadratic kinetic term should be a (semi)positive form (and therefore a convex function). Zero-modes should be gauge-fixed. Interaction terms are usually treated perturbatively.
In conclusion, convexity does not seem to be a first principle per se, but rather a consequence of the type of QFTs that we typically are able to make sense of. It might be that it is possible to give a non-perturbative definition of a non-convex (but unitary) theory.