Difficulty Understanding Sufficient Conditions for Weak Extrema in Calculus of Variations

calculus-of-variationsfunctional-analysisvariational-analysis

I am having a difficult time understanding Jacobi's necessary condition for weak extrema of functionals. Graphics and detailed explanations would be helpful. I am following the following two texts:

  1. Calculus of Variations by Gelfand and Fomin
  2. The Calculus of Variations by Bruce van Brunt

Perhaps Introduction to the Calculus of Variations by Charles Fox has some insights, but I don't have the book yet.

I understand how the second variation can be written

$$
\delta^2J\left[h\right]=\frac{1}{2}\int_{a}^{b}\left( F_{y'y'}h'^2 + \left( F_{yy}-\frac{d}{dx}F_{yy'}\right)h^2 \right) \, dx
$$

or alternatively, if

$$
P = P(x) \doteq \frac{1}{2}F_{y'y'} \\
Q = Q(x) \doteq \frac{1}{2}\left( F_{yy} – \frac{d}{dx}F_{yy'} \right)
$$

as

$$
\delta^2J\left[h\right]=\int_{a}^{b}\left( Ph'^2 + Qh^2 \right) \, dx. \tag{1}\label{eq1}
$$

simply using integration by parts.

I also understand how Legendre wanted to write $\eqref{eq1}$ as a perfect square, adding

$$
\int_a^bd\left( wh^2 \right) \, dx = \int_a^b\left( w'h^2 + 2whh'\right) \, dx = 0,
$$
where $w$ is any arbitrary differentiable function, to the LHS and RHS (the integral is $0$ since $h(a)=h(b)=0$ by definition).

Completing the square in the integrand and comparing to the general form of a binomial leads to the Riccati equation

$$
P\left( Q+w' \right) = w^2
$$

and, with the change of variables

$$
w=-\frac{u'}{u}P,
$$

yields the Jacobi accessory equation

$$
-\frac{d}{dx}\left(Pu'\right)+Qu=0. \tag{2}\label{eq2}
$$

We know if $w$ exists ("big if"), the completed square form of \eqref{eq1} is

$$
\delta^2J[h]=\int_a^bP\left( h'+\frac{w}{P}h \right)^2 \, dx
$$

which I know from YouTube videos is written in Fox's text (substituting $w=-\frac{u'}{u}P$)

$$
\delta^2J[h]=\int_a^bP\left( h'-\frac{u'}{u}h \right)^2 \, dx.
$$

What I don't understand are

  1. Conjugate points and how they help yield the sufficient condition. (For one thing, why don't we simply call them "zeros of nontrivial solutions to the Jacobi accessory equation \eqref{eq2}"? Why are they special? They seem to show up randomly.)

  2. Why does $u(a)$ have to be zero? Moreover, why is $u(b) \ne 0$ and why are we looking at $u$ over the semi-open interval $(a,b]$?

Best Answer

Alright, there's a lot to dissect here. I finally received the book by Fox, which only helped to a respect. I would definitely say, for anyone learning this for the first time, learn from these three texts (below) TOGETHER and use your mathematical intuition to obtain the vital concepts because there are many things that I think are wrong in each text on this subject (i.e. severely confusing/misleading to the point of counterproductivity). Together, however, they provided me with a complete picture. I've listed them in order of importance with regard to my questions (Bruce van Brunt's book yielded the most insight):

  1. The Calculus of Variations by Bruce van Brunt

  2. An Introduction to the Calculus of Variations by Charles Fox

  3. Calculus of Variations by Gelfand and Fomin

ANSWERS:

I will start with 2 because it leads nicely into 1. In addition, I note that conjugate points are truly zeros of nontrivial solutions to the Jacobi accessory equation. Their graphical connotation are given a bit better in Bruce van Brunt's book and in Fox's book, suffice to say the notion of "conjugate" stems from the notion that the second variation is the infinite-dimensional version of the quadratic form. (To broadly illustrate, solutions to the quadratic equation have a "plus/minus" form, where the two solutions may be said to be conjugates of one another; this is seen most clearly when solutions are complex and are hence said to be complex conjugates).

  1. Is $\mathbf{u(a) = 0}$?

    To begin, $u$ CANNOT be zero! In particular, Bruce van Brunt states (Section 10.4.2)

    "If there is a solution $u$ to [the Jacobi accessory] equation that is valid on $[x_0,x_1]$ and such that $\mathbf{u(x) \ne 0} $ for all $\mathbf{x \in [x_0,x_1]}$, then [the] transformation [$w=\frac{u'}{u}f_{y'y'}$] implies that the Riccati equation has a solution valid for $x \in [x_0,x_1]$."

    This appeared obvious to me from the definition for $w$ ($u$ being zero anywhere would make $w$ infinite), but was confused by Gelfand & Fomin using $h$ as both the variation/perturbation given to $y$ and the solution to the Jacobi accessory equation. They "second-handedly" define the Jacobi accessory equation as "the Euler equation of the second variation" (Gelfand & Fomin, Definition 1 on p. 111). They state in Section 26, footnote 7 (p. 130)

    "It must not be thought that this is done in order to find the minimum of the [second variation] functional. In fact, because of homogeneity, its minimum is either 0 if the functional is positive definite, or $- \infty$ otherwise. In the latter case, it is obvious that the minimum cannot be found from the Euler equation...The reader should also not be confused by our use of the same symbol $h(x)$ to denote both admissible functions, in the domain of the [second variation] functional, and solutions of [Jacobi accessory] equation. This notation is convenient, but whereas admissible functions must satisfy $h(a) = h(b) = 0$, the condition $h(b) = 0$ will usually be explicitly precluded for nontrivial solutions of [the Jacobi accessory equation]."

    I find this notation inconvenient and confusing, especially as the Jacobi accessory equation should not be confused with the Euler equation (perhaps by "convenient" he meant "it is convenient that the equation happens to be an Euler equation in a respect"; vis-a-vis language barrier/incomplete English translation).

    In particular, Bruce van Brunt notes

    "Finally, we note that if we consider the second variation as a function [of $h$] in its own right, the Jacobi accessory equation is the Euler-Lagrange equation for this functional. There is, however, a distinction to be made concerning solutions. Specificially, the functions [$h$] that solve the Euler-Lagrance equation must vanish at the endpoints. In contrast, we are actively seeking solutions to the Jacobi accessory equation that do not vanish in $(x_0, x_1]$."

    Clearly, the Jacobi and Euler-Lagrange equations should be treated separately, with separate variables: $u$ are the solutions to the Jacobi accessory equation (a PDE) and $h$ are variations/perturbations to the second variation.

    (I would like to stress the first sentence made in Gelfand & Fomin, Section 26, footnote 7 (p. 130), that we are NOT looking to minimize the second variation. In fact, we are looking to find its sign. Just as in multivariable calculus (of functions) where we determine if $f'(x)=0$ corresponds to a minimimum, maximum, or saddle by determining the sign of $f''(x)$, we determine if the Euler-Lagrange equation corresponding to a functional is a minimum, maximum, or saddle by determining the sign of its second variation. This is where Bruce van Brunt does a great job, particularly in Section 10.1 and introducing the Morse Lemma.)

    Furthermore, the second variation is identically zero if $u(x_0)=u(x_1)=0$ (Bruce van Brunt, Lemma 10.4.5) (Gelfand & Fomin, Lemma on p. 108) and Legendre's (strengthened) condition necessitates $F_{y'y'}>0$. This mirrors the facts that the second variation is identically zero if and only if $h$ is identically zero (Gelfand & Fomin), and solutions $u$ to the Jacobi accessory equation for $h \not\equiv 0$ and $h(a)=h(b)=0$ are $u=\alpha h(x)$ for nonzero constants $\alpha$ (Gelfand & Fomin, Remark of p. 105) (Fox, p. 39). Fox helps illuminate this is so by suggesting the reader plug $u(x)=\alpha h(x)$ into $h'(x)-\frac{u'(x)}{u(x)}h(x)=0$ and verifying it is indeed zero, which I feel would have benefited Gelfand & Fomin's text.

  1. Conjugate Points

    So if $u$ cannot be zero, why are we looking at zeros of $u$ (namely, conjugate points)? In short, it's to find what $u$ cannot be. With regard to the question on semi-open vs. closed interval notation 2, this is another area where Bruce van Brunt shines. To me, Bruce van Brunt seems to be more consistent throughout in his notation whereas Gelfand & Fomin struggle with their interval notation (which is only slightly cleared up when they present Jacobi's necessary condition on p. 111). A point cannot be its own conjugate, so the semi-interval notation of Bruce van Brunt is more appropriate than Gelfand & Fomin's closed-interval notation.

    At this point, things become difficult. The strength in the use of conjugate points is (to me) quite subtle in its proof. I'll give the answer through the proof of Jacobi's Necessary Condition.

Lemma 1

Let $u$ be a solution to the Jacobi accessory equation in $[a,b]$. If there is a point $c \in [a,b]$ such that $u(c) = 0$ and $u'(c) = 0$, then $u$ must be the trivial solution.

Lemma 2

Let $u$ be a solution to the Jacobi accessory equation in $[a,b]$ such that $u(a)=u(b)=0$. Then $\int_{a}^{b} \left( Pu'^2 + Qu^2 \right) \, dx = 0$ (i.e. the second variation is zero).

Theorem 1

If $y$ is a local minimum of $J$, then $\delta^2 J[h] \ge 0$ (i.e. the second variation is positive semidefinite).

If $y$ is a local maximum of $J$, then $\delta^2 J[h] \le 0$ (i.e. the second variation is negative semidefinite).

Theorem 2

Let $H$ be the set of functions $h$ smooth on $[a,b]$ such that $h(a)=h(b)=0$. Let $f$ be a smooth function of $x, y,$ and $y'$, and let $y$ be a smooth extremal for $J$ such that $P = f_{y'y'} > 0 \, \forall x \in [a,b]$. Then

  1. If $\delta^2J[h] > 0 \, \forall h \ne 0$, then there is no point conjugate to $a$ in $(a,b]$ (i.e. semi-open interval).
  2. If $\delta^2J[h] \ge 0 \, \forall h \in H$, then there is no point conjugate to $a$ in $(a,b)$ (i.e. open interval; interior).

Jacobi's Necessary Condition (Theorem 1 plus part 2 of Theorem 2)

Let $y$ be a smooth extremal for $J$ such that $P = f_{y'y'} > 0 \, \forall x \in [a,b]$ along $y$. If $y$ produces a minimum for $J$, then there are no points conjugate to $a$ in $(a,b)$.

NOTE: Jacobi's necessary condition does not preclude the possibility that $x_1$ is conjugate to $x_0$ and corresponds only to the open interval $(x_0,x_1)$.

Proof

Part 1 of Theorem 2 (Jacobi's Necessary Condition follows from this)

If $h$ is nontrivial and $h(a)=h(b)=0$, then we know $u=\alpha h$. Since $\alpha$ is a constant, it can still come out of the integrand of Lemma 2, so Lemma 2 applies, which contradicts the second variation being positive definite. Hence, $b$ cannot be conjugate to $a$.

For the interval $(a,b)$, construct a 1-parameter family of positive definite functionals, $K(\mu)$,

$$ K(\mu) = \mu \delta^2 J[h] + (1-\mu)\int_a^b h'^2, \, dx \,\,\, \forall \mu \in [0,1] $$

The integral on the right-hand side (RHS) has no points conjugate to $a$ since its Jacobi accessory equation is $u''(x)=0$, which has the general solution $u(x)=c_1 + c_2 x$, where $c_1$ and $c_2$ are constants. Only the trivial solution can satisfy $u(a)=u(\kappa)=0 \,\,\, \forall \kappa \in \mathbb{R}-\{x_0\}$, hence it has no points conjugate to $a$ (as, by definition, conjugate points only exist for nontrivial solutions to the Jacobi accessory equation). (This point was stated without proof in Gelfand & Fomin).

The integral is psoitive definite since the integrand is a square and the second variation is positive definite by hypothesis. Therefore, $K$ must be positive definite over all $\mu \in [0,1]$. The Jacobi accessory equation for $K$ is

$$ \frac{d}{dx} \{ \left( \mu P \left( 1 - \mu \right) \right) u' \} - \mu Q u = 0. $$

The Jacobi accessory equation, in general, is a Sturm-Liouville equation, from which Fox shows $u$ and $u_x$ cannot vanish simultaneously (Section 2.18, p. 55). Hence, $u=u(x;\mu)$ is a smooth function in $x$. Also, $u(x;\mu)$ is a smooth function in $\mu$ since the coefficients to $u'$ and $u$ are continuous for each $\mu \in [0,1]$. Therefore, $u$ has continuous partial derivatives $u_x$ and $u_{\mu}$.

Suppose there exists a family of nontrivial solutions to the Jacobi accessory equation above, $U$, such that $u(a, \mu)=0 \,\,\, \forall \mu \in [0,1]$. Suppose the second variation ($K(1)$) has a conjugate point $\kappa \in (a,b)$. (Note that $K(1)$ is the second variation and $K(0)$ is a term free of points conjugate to $a$.). Then, there is a $u \in U$ such that $u(\kappa;1) = 0$. By Fox's argument (vis-a-vis Sturm-Liouville), if $u(\kappa,1)=0$, then $u_x(\kappa,1) \ne 0$. We can therefore invoke the implicit function theorem in a neighborhood of $(\kappa, 1)$ to assert the existence of a unique function $x(\mu)$ such that $u(x(\mu),\mu)=0$, $x(1)=\kappa$, and $x'(\mu)=-\frac{u_{\mu}}{u_x}$. The function $u = u(x(\mu),\mu)$ describes a parametric curve $\gamma$ with a continuous tangent that is nowhere horizontal (nonzero derivative). This leads to the five cases discussed in Gelfand & Fomin (see p. 110, Figure 7), where it is proven no such curve $\gamma$ can exist.

Related Question