- solution (if one found) of the NLP in direct methods are local optimal but not neccesary global optimal (except for some special types of NLPs, e.g. convex NLP).
for the indirect approach, you solve as you already mentioned, the neccessary condition for optimality of the optimal control problem (minimum principle ,...). This forms a boundary value problem (BVP) which is often nonlinear and has several local optimal solutions (optimal state and optimal costate). The problem here is, you can solve the BVP but you obtain only ONE local optimal solution (you don't know if its global optimal). In order to find the global optimal one, you would have to determine all local optimal solutions and then use the HJB equation to check, which one of the local opt. solutions are also global optimal.
In generall, the solution depends on your initial guess (for solving numerically). For two different initial guesses you might obtain two different solutions. This holds for both methods (direct, indirect)
i don't understand your question! Please reformulate!
i am not sure why they call it like that. The main difference of direct and indirect methods are:
general optimal control: minimize a functional (performance measure) subject to constraints/bounds and of course the dynamics
direct method: solve DIRECTLY this optimal control problem as it is written there.
indirect method: solve the neccessary condition (BVP) which INDIRECTLY represents the original optimal control problem. you just rewrite the optimal control problem (which is an optimization problem) in another optimization problem. Both problems
then, you can use the same methods (e.g. collocation, shooting) for both direct and indirect methods. you only solve different sytems of ODE with different constraints.
I will try a very simplistic answer (I hope the more mathematically rigorous don't take offense). The way I always have seen it is that essentially the information in the maximum or minimum principle is that if you are at an extremum, then any deviation from it causes variations in the second order and not the first.
That is a very simplistic answer that glosses over a lot of mathematical details but that is the intuition and hence its relation to the Calculus of Variations, Lagrangians and Hamiltonians. Now when there are constraints involved, just like in Calculus of Variations we apply a version of the method of Lagrange multipliers (it can also be viewed as Legendre transform of the Lagrangian) and then $\lambda$ is defined to be the co-state (as mentioned above).
Mathematically (and crudely) speaking, if your state $x$ evolves on a manifold $M$, then your system $ \dot x = f(x)$ defines a flow on the tangent bundle $TM$ and the optimal control defines via the costate a flow on the cotangent bundle, $T^*M$.
You can find an excellent Feynman Lecture to develop intuition on the topic.
EDIT: There are deeper connections here, which if you are me like might not be obvious to you (I was taught four years of Mechanical Engineering without ever coming across Lagrangian or Hamiltonian mechanics). In classical mechanics we use Newtons Laws, primarily $ F = ma $ to solve most problems. But this tends to be a coordinate dependent method, and Lagrange in his pursuit of a coordinate free treatment of mechanics invented Lagrangian mechanics where in homogenous coordinates, $q$, Newton's second law became the condition that Euler-Lagrange equations must be satisfied.
$$ \frac{d}{{dt}}\left( {\frac{{\partial L}}{{\partial {{\dot q}_j}}}} \right) = \frac{{\partial L}}{{\partial {q_j}}} $$
Note that the Euler-Lagrange equations can also be derived from a calculus of variations perspective. Here the Lagrangian $L=T-V$ is defined as the kinetic energy less the potential energy (in the optimal control setting it tends to be the running cost function).
Hamilton realized that there was a variational principle involved here and defined the Hamiltonian $H$ to be the Legendre transform of the Lagrangian which by its definition introduced the co-state or adjoint variable and then the classical laws became that Hamilton's equations be satisfied.
$$ \frac{{\partial H}}{{\partial {q_j}}} = - {\dot p_j},\quad \frac{{\partial H}}{{\partial {p_j}}} = {\dot q_j},\quad \frac{{\partial H}}{{\partial t}} = - \frac{{\partial L}}{{\partial t}} $$
The variational principle was that the correct dynamics minimizes the action integral defined as $ S = \int_{{t_1}}^{{t_2}} {Ldt} $ and satisfies the above equations. So in the optimal control setting when we form the Hamiltonian and set up the co-state equation, we are in essence following this "Principle of Least Action" where the Lagrangian is now our cost function, and the Hamiltonian can be thought of as a Langrange multiplier that enforces the condition that the state adheres to the system dynamics.
Best Answer
Concerning the theoretical part, I wouldn't say that the costate equations must be solved backwards. To take a step back, in the usual setting for the Lagrange problem, $x(0)$ is fixed and $x(T)$ is free. In this setting, for the costate, $p(0)$ is free and $p(T) = 0$. But for slightly different settings where part of the initial state $x(0)$ and part of the final state $x(T)$ are constrained (known) and others free (unknown), you would have complementary conditions on $p(0)$ and $p(T)$. This comes from the proof of the maximal principle and the "dual" nature of $p$.
At the heuristic level, you can somehow understand it. In Pontryagin's maximum principle, the costate variable is here precisely to help you transform a local-in-time maximization criterion into something that will turn out to be optimal for the whole time domain. In some sense, it thus embeds information about the future, i.e. $p(t)$ should contain information about what will happen on $[t,T]$.
Now for the numerical simulation,