$O(h^3)$ in the second-order approximation for $f(\mathbf{x}^*)$

I am currently studying the textbook Algorithms for Optimization by Mikel J. Kochenderfer and Tim A. Wheeler. Chapter 1.6.2 Multivariate says the following:

The following conditions are necessary for $\mathbf{x}$ to be at a local minimum of $f$:

$\nabla f(\mathbf{x}) = 0$, the first-order necessary condition (FONC)

$\nabla^2 f(\mathbf{x})$ is positive semidefinite (for a review of this definition, see appendix C.6), the second-order necessary condition (SONC)

The FONC and SONC are generalizations of the univariate case. The FONC tells us that the function is not changing at $\mathbf{x}$. Figure 1.8 shows examples of multivariate functions where the FONC is satisfied. The SONC tells us that $\mathbf{x}$ is in a bowl.

The FONC and SONC can be obtained from a simple analysis. In order for $\mathbf{x}^*$ to be at a local minimum, it must be smaller than those values around it:

$$f(\mathbf{x}^*) \le f(\mathbf{x} + h \mathbf{y}) \iff f(\mathbf{x} + h\mathbf{y}) – f(\mathbf{x}^*) \ge 0 \tag{1.14}$$

If we write the second-order approximation for $f(\mathbf{x}^*)$, we get:

$$ f(\mathbf{x}^* + h \mathbf{y}) = f(\mathbf{x}^*) + h \nabla f(\mathbf{x}^*)^T \mathbf{y} + \dfrac{1}{2} h^2 \mathbf{y}^T \nabla^2 f(\mathbf{x}^*)\mathbf{y} + O(h^3) \tag{1.15}$$

I'm wondering where the $O(h^3)$ term came from in 1.15? I cannot see why it would algebraically be there?

I would appreciate it if someone would please take the time to clarify this.

Best Answer

The term $O(h^3)$ means that the estimation error is locally bounded by a third degree polynomial. For example, the second order estimation of $f(x)=e^x$ at $x=2$ is $g(x) = e^2(0.5x^2 - x + 1)$, so $f(x) = g(x) + O(x^3)$. The higher the power of the error term, the more rapidly it goes to $0$ as $x\to 2$. Note that this example shows that the error term does not hold on the entirety of $\mathbb{R}$.

Best Answer

Related Solutions

Global maxima and minima of $f(x,y)=x^2 + y^2 + \beta xy + x + 2y$

Convex Optimization – Can Gradient Descent Fail for Convex Functions?

Related Question