Solved – energy minimization in machine learning

I was reading about optimization for an ill-posed problem in computer vision and came across the explanation below about optimization on Wikipedia. What I don't understand is, why do they call this optimization "Energy minimization" in Computer Vision?

An optimization problem can be represented in the following way:

Given: a function $f: A \to R$ from some set $A$ to the real numbers

Sought: an element $x_0$ in $A$ such that $f(x_0) ≤ f(x)$ for all $x$ in $A$
("minimization") or such that $f(x_0) ≥ f(x)$ for all $x$ in $A$
("maximization").

Such a formulation is called an optimization problem
or a mathematical programming problem (a term not directly related to
computer programming, but still in use for example in linear
programming – see History below). Many real-world and theoretical
problems may be modeled in this general framework. Problems formulated
using this technique in the fields of physics and computer vision may
refer to the technique as energy minimization, speaking of the value
of the function $f$ as representing the energy of the system being
modeled.

Best Answer

Energy-based models are a unified framework for representing many machine learning algorithms. They interpret inference as minimizing an energy function and learning as minimizing a loss functional.

The energy function is a function of the configuration of latent variables, and the configuration of inputs provided in an example. Inference typically means finding a low energy configuration, or sampling from the possible configuration so that the probability of choosing a given configuration is a Gibbs distribution.

The loss functional is a function of the model parameters given many examples. E.g., in a supervised learning problem, your loss is the total error at the targets. It's sometimes called a "functional" because it's a function of the (parametrized) function that constitutes the model.

Major paper:

Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. J. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, MIT Press, 2006.

Also see:

LeCun, Y., & Huang, F. J. (2005). Loss Functions for Discriminative Training of Energy-Based Models. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AIStats’05). Retrieved from http://yann.lecun.com/exdb/publis/pdf/lecun-huang-05.pdf

Ranzato, M., Boureau, Y.-L., Chopra, S., & LeCun, Y. (2007). A Unified Energy-Based Framework for Unsupervised Learning. Proc. Conference on AI and Statistics (AI-Stats). Retrieved from http://dblp.uni-trier.de/db/journals/jmlr/jmlrp2.html#RanzatoBCL07