I will try a very simplistic answer (I hope the more mathematically rigorous don't take offense). The way I always have seen it is that essentially the information in the maximum or minimum principle is that if you are at an extremum, then any deviation from it causes variations in the second order and not the first.
That is a very simplistic answer that glosses over a lot of mathematical details but that is the intuition and hence its relation to the Calculus of Variations, Lagrangians and Hamiltonians. Now when there are constraints involved, just like in Calculus of Variations we apply a version of the method of Lagrange multipliers (it can also be viewed as Legendre transform of the Lagrangian) and then $\lambda$ is defined to be the co-state (as mentioned above).
Mathematically (and crudely) speaking, if your state $x$ evolves on a manifold $M$, then your system $ \dot x = f(x)$ defines a flow on the tangent bundle $TM$ and the optimal control defines via the costate a flow on the cotangent bundle, $T^*M$.
You can find an excellent Feynman Lecture to develop intuition on the topic.
EDIT: There are deeper connections here, which if you are me like might not be obvious to you (I was taught four years of Mechanical Engineering without ever coming across Lagrangian or Hamiltonian mechanics). In classical mechanics we use Newtons Laws, primarily $ F = ma $ to solve most problems. But this tends to be a coordinate dependent method, and Lagrange in his pursuit of a coordinate free treatment of mechanics invented Lagrangian mechanics where in homogenous coordinates, $q$, Newton's second law became the condition that Euler-Lagrange equations must be satisfied.
$$ \frac{d}{{dt}}\left( {\frac{{\partial L}}{{\partial {{\dot q}_j}}}} \right) = \frac{{\partial L}}{{\partial {q_j}}} $$
Note that the Euler-Lagrange equations can also be derived from a calculus of variations perspective. Here the Lagrangian $L=T-V$ is defined as the kinetic energy less the potential energy (in the optimal control setting it tends to be the running cost function).
Hamilton realized that there was a variational principle involved here and defined the Hamiltonian $H$ to be the Legendre transform of the Lagrangian which by its definition introduced the co-state or adjoint variable and then the classical laws became that Hamilton's equations be satisfied.
$$ \frac{{\partial H}}{{\partial {q_j}}} = - {\dot p_j},\quad \frac{{\partial H}}{{\partial {p_j}}} = {\dot q_j},\quad \frac{{\partial H}}{{\partial t}} = - \frac{{\partial L}}{{\partial t}} $$
The variational principle was that the correct dynamics minimizes the action integral defined as $ S = \int_{{t_1}}^{{t_2}} {Ldt} $ and satisfies the above equations. So in the optimal control setting when we form the Hamiltonian and set up the co-state equation, we are in essence following this "Principle of Least Action" where the Lagrangian is now our cost function, and the Hamiltonian can be thought of as a Langrange multiplier that enforces the condition that the state adheres to the system dynamics.
Best Answer
In the Book by Piccoli and Bressan, Introduction to Mathematical Control Theory, you can find your answer in the chapter 7 of sufficient condition for optimality. Basically you have:
The first point is obvious. For a deeper explanation of the third point is better to directly refer to literature. For the second I only want to remember that convexity must be understood w.r.t. a set of functions, generally subset of $L^\infty$, beacuse the value of the functional depends on the trajectory of the system under the action of the control law.
Your particular where the "ODE system to be controlled is given in terms of polynomials" is rather obscure. I figure something like
$$ \dot x^i = A^i + B^{i}_j x^j + C^{i}_{jk} x^j u^k + D^i_j u^j + E^i_{jk} x^j x^k + F^i_{jkl} x^j x^k u^l + G^i_{ij} u^j u^k + \cdots $$
So your description is rather general.