Why do Lagrange multipliers work in function spaces

calculus-of-variationsoptimization

I know that to extremise a functinal F[y] subject to the constraint G[y]=0, you may use Lagrange multipliers to transform to an unconstrained problem.

In finite dimension (in the differentiable case), I understand Lagrange multipliers as a way to encode the constraint and the necessary condition that the derivative of the objective function be orthogonal to the tangent space of the constraining surface.

In all the references (and StackExchange answers) I have seen, the justification in the case of functionals is quite glib — but it seems to me that a similar argument about (functional) derivatives and orthogonality is much messier and more complicated, and it's not clear at all why this method should work.

Best Answer

You need to know a pinch of functional analysis in Banach spaces, but apart from that the basic argument is fairly straight-forward linear algebra and the same whether you consider a functional problem or a finite dimensional one.

Let $Y$ be a Banach space and consider two continuous linear maps $DF: Y\to {\Bbb R}$ and $DG : Y \to {\Bbb R}^k$ with the additional requirement that $DG$ is surjective. Under these conditions:

Lemma : (1) ${\rm ker} DG \subset {\rm ker} DF$ iff (2) there exists a linear map $\Lambda : {\Bbb R}^k \to {\Bbb R}$ so that $DF=\Lambda \circ DG$.

The first translates that at some point $F$ is extremal under the condition that $G=0 \in {\Bbb R}^k$ and that $G$ is a regular constraint ($DG$ surjective). The second states the existence of Lagrange multipliers at the given point.

That (2) implies (1) is obvious. For the converse direction, $Z={\rm ker} DG$ is a closed co-dim $k$ subspace of $Y$ whence admits a $k$-dimensional complement $W$ (this is where some functional analysis is used). It follows that $DG : W \to {\Bbb R}^k$ is bijective, thus has an inverse $A$. Furthermore $DF = DF \circ A \circ DG$ since they act the same on elements in $Z$ and in $W$. Finally, setting $\Lambda=DF \circ A: {\Bbb R}^k \to {\Bbb R}$ you get the wanted form: $DF=\Lambda \circ DG$.

Related Question