[Math] Euler lagrange assumptions

calculus-of-variationsoptimization

I have a question related to these two posts:

(1) Euler-Lagrange, Gradient Descent, Heat Equation and Image Denoising

and

(2) When the Euler Lagrange equation simplifies to zero

Background


Suppose we have a functional (similar to (1)):

$E(u,u_x,u_y)=\int_\Omega(u_x^2 + u_y^2)dxdy$,

where $\Omega$ is the image domain, and $u(x,y)$ is a grayscale $256\times256$ image.

As I understand it, one can calculate the first variation of $E$:

$\delta E(u,u_x,u_y)(h(x,y))$ according to http://en.wikipedia.org/wiki/First_variation,

where $h$ is basiacly some unknown image (i.e. scalar valued 2D function).

In doing so, one eventually reaches a point where integration by parts is used, and it assumed that the end points of $u$, i.e. the bounds of the image, are fixed. The equation then simplifies, and becomes:

$\delta E(u,u_x,u_y)(h(x,y))=\int_\Omega(2u_{xx} + 2u_{yy})h(x,y)dxdy$

Okay. Now, if we knew $h$ then calculating $\delta E(u,u_x,u_y)(h(x,y))$ would be trivial. However, in general we don't know $h$ (though methods exist for solving for it, see http://tau.ac.il/~barleah/papers/Siam09.pdf).

However, if we can force $2u_{xx}+2u_{yy}=0$ for all $x,y\in\Omega$ then by definition $\delta E(u,u_x,u_y)(h(x,y))=0$. These equations are the Euler Lagrange equations.

Now one typically minimizes $E$ by using gradient descent (see Euler-Lagrange, Gradient Descent, Heat Equation and Image Denoising ).

Introducing a time step, $t$, and given an intial image, $u(x,y,0)$, then performing updates as:

$u(x,y,t+1) = u(x,y,t) + u_{xx}(x,y,t) + u_{yy}(x,y,t)$

Question


In order to arrive at:

$\delta E(u,u_x,u_y)(h(x,y))=\int_\Omega(2u_{xx} + 2u_{yy})h(x,y)dxdy$

we assumed that $h(x,y)=0$ for the boundaries of the image (see When the Euler Lagrange equation simplifies to zero).

What consequnces does this assumption of $h(x,y)=0$ for the boundaries of the image, have on the solution? (Source, "the Euler-Lagrange equation is derived using integration by parts on the assumption that all candidate functions take the same values at the endpoints" —When the Euler Lagrange equation simplifies to zero)

Are only variations that don't modify the boundaries of the image explored? i.e. is the solution we obtain the minimum of $E$ assuming that $u(x,y,\infty)$ = $u(x,y,0)$ at the boundaries of $\Omega$?

Best Answer

In this answer I explained why I wouldn't put too much stock in the paper that the first question you linked to refers to. The sloppiness of that paper is relevant to your present question in two respects, both with respect to the boundaries and with respect to discretization.

Regarding discretization, the paper doesn't make a clean distinction between the continuous and the discretized version of the problem. This is important, however, since regarding the Euler-Lagrange expression as a sort of gradient is more subtle in the continuous case. Remember that this expression is derived by integrating by parts in order to write the first variation in the form of an integral of the increment $h$ times some function $g$, which comes out as the Euler-Lagrange expression. Then regarding $g$ as the gradient of the objective function relies on the fact that $g$ is the direction in which the integral of $gh$ is maximal among all functions with the same $L^2$ norm. However, if $h$ is constrained to vanish at the boundary and $g$ doesn't, then there is no such direction of maximal first variation, since $h$ can be made arbitrarily similar to $g$ but not equal to $g$. Also, the more similar we make the search direction to $g$ (in the $L^2$ sense) by very rapidly approaching the value of $g$ at the boundaries, the less suitable the resulting function becomes for a gradient descent step, since the steep changes at the boundaries will allow smaller steps the steeper we make them.

Thus, in the continuous case the whole approach only makes sense if it's known a priori that the Euler-Lagrange expression will vanish at the boundaries, and in that case the function being optimized will never change at the boundaries. In the discrete case, the problem isn't as fundamental, since we can just add the constraint that $h$ vanishes at the boundaries and thus get a function that maximizes the first variation in the $\ell^2$ sense; whether this is a useful search direction is another question.

The paper doesn't actually, as far as I can tell, say how the boundaries are treated; the derivatives approximated as differences don't make sense at the boundaries the way they're written. One sensible way to fill that gap would be to fill the rows and columns around the boundary either with certain constant values or by duplicating the values inside the boundary, which would correspond to a zero gradient; in both cases the boundary terms from the integration by parts would vanish, but the Euler-Lagrange expression wouldn't necessarily.

To answer your question more specifically: No, the fact that the boundary conditions were used in deriving the Euler-Lagrange expression generally doesn't imply that they will be satisfied at the end; unless the Euler-Lagrange expression happens to vanish at the boundaries, you have to make sure they're satisfied by the search directions. If you don't, you may well still get a practically useful method, but you won't be rigorously minimizing the objective function using gradient descent, since you'll be neglecting the boundary terms from the integration by parts. Finally, if you don't want the boundary conditions to be satisfied, then you can't use the Euler-Lagrange expression as a gradient, or at least you'll have to find some way to deal with the boundary terms.