Solved – Loss function for linear regression with calculus of variations

regression

I'm struggling with mathematics behind linear regression. In the following lines I pasted the text from the book Pattern Recognition and Machine Learning (p. 46) where author derives the regression function $\mathbb{E}_{t} [t | \mathbf{x}]$. I want to understand the procedure from the equation (2) to the final result. Could somebody please provide me some useful pointers (and/or links) which concept from the calculus of variations should I study.

The average, expected, loss is given by

$$
\mathbb{E}[L] = \int \int L(t, x (\mathbf{x})) p (\mathbf{x}, t) \, d\mathbf{x} \, dt.
\tag{1}
$$

A common choice of loss function in linear regression is the squared loss given by $L (t, y(\mathbf{x})) = \{ y (\mathbf{x}) – t \}^{2}$. In this case, the expected loss can be written as

$$
\mathbb{E}[L] = \int \int \{ y (\mathbf{x}) – t \}^{2} p (\mathbf{x}, t) \, d\mathbf{x} \, dt.
\tag{2}
$$

Our goal is to choose $y (\mathbf{x})$ so as to minimize $\mathbb{E} [L]$. We can do this using the calculus of variations to give

$$
\dfrac{\delta \mathbb{E} [L]}{\delta y (\mathbf{x})} = 2 \int \{ y (\mathbf{x}) – t \} p (\mathbf{x}, t) \, dt = 0.
\tag{3}
$$

Solving for $y (\mathbf{x})$, and using the sum and product rules of probability, we obtain

$$
y (\mathbf{x}) = \dfrac{\int tp (\mathbf{x}, t) \, dt}{p (\mathbf{x})} = \int t p (t | \mathbf{x}) \, dt = \mathbb{E}_{t} [t | \mathbf{x}]
\tag{4}
$$

Best Answer

I am assuming your difficulty is in the jump between Eq.2 and Eq.3. All you need is an Euler-Lagrange equation, as in their equation (3). In their notation $f(x,y,\dot y)$ would be your $\int \{y(x)-t\}^2p(x,t)dx$, so that $df/d\dot{y}=0$, for instance.

Related Question