Differentiable Functions Multivariate Definition

analysismultivariable-calculusoptimization

The book "Nonlinear Programming" by Bazaraa, Sherali, and Shetty has the following definition in its appendix:

Let $S$ be a nonempty set in $\mathbb{R}^n$, $\bar{x} \in \operatorname{int} S$ and let $f:S\to \mathbb{R}$. Then $f$ is said to be differentiable at $\bar{x}$ if there is a vector $\nabla f (\bar{x})$ in $\mathbb{R}^n$ called the gradient of $f$ at $\bar{x}$ and a function $\beta$ satisfying $\beta (\bar{x};x) \to 0$ as $x \to \bar{x}$ such that
\begin{align*}
f(x) = f(\bar{x}) + \nabla f(\bar{x})^t(x-\bar{x}) + \|x-\bar{x}\|\beta (\bar{x}; x) \quad \forall x \in S.
\end{align*}

The gradient vector consists of the partial derivatives, that is,

\begin{align*}
\nabla f (\bar{x})^t = \left(\frac{\partial f(\bar{x})}{\partial x_1}, \frac{\partial f (\bar{x})}{\partial x_2}, \ldots, \frac{\partial f(\bar{x})}{\partial x_n} \right).
\end{align*}

Can someone please explain why this makes sense and where this definition comes from? I've looked in two analysis books and did not see this definition. And what does the semicolon mean in this case in "$\beta (\bar{x}; x)$"?

Best Answer

Think about the differentiability of a function $f:\left(a,b\right)\subset\mathbb{R}\to \mathbb{R}$. If $x\in \left(a,b\right)$, then $f'\left(x\right)$ is ordinarily defined to be the real number

\begin{equation*} f'\left(x\right) = \lim_{h\to 0}{\frac{f\left(x+h\right)-f\left(x\right)}{h}}, \end{equation*}

provided that the limit exists. Therefore, we can write

\begin{equation*} f\left(x+h\right) - f\left(x\right) = f'\left(x\right)h + r\left(h\right) \end{equation*}

where $r\left(h\right)$ is a remainder term that satisfies

\begin{equation*} \lim_{r\to 0}{\frac{r\left(h\right)}{h}} = 0. \end{equation*}

This motivates the definition of differentiability for a function $f$ from an open subset $S\subset\mathbb{R}^{n}$ to $\mathbb{R}$. Namely, $f:S\subset\mathbb{R}^{n}\to\mathbb{R}$ is differentiable at the point $\mathbf{a}\in S$ if there exists $\mathbf{c}\in \mathbb{R}^{n}$ such that

\begin{equation*} \lim_{\mathbf{h}\to 0}{\frac{f\left(\mathbf{a}+\mathbf{h}\right)-f\left(\mathbf{a}\right) - \mathbf{c}\cdot\mathbf{h}}{\lVert{\mathbf{h}\rVert}}} = 0. \end{equation*}

In this case, $\mathbf{c}$ is called the gradient of $f$ at $\mathbf{a}$ and is denoted $\nabla f\left(\mathbf{a}\right)$.

If we define $E\left(\mathbf{h}\right) = f\left(\mathbf{a}+\mathbf{h}\right) - f\left(\mathbf{a}\right) - \nabla f\left(\mathbf{a}\right)\cdot\mathbf{h}$, then we can write

\begin{equation*} f\left(\mathbf{a}+\mathbf{h}\right) = f\left(\mathbf{a}\right) + \nabla f\left(\mathbf{a}\right)\cdot\mathbf{h} + E\left(\mathbf{h}\right)\,\,\,\,\mbox{ where }\,\,\,\,\frac{E\left(\mathbf{h}\right)}{\lVert{\mathbf{h}\rVert}}\to 0\,\,\,\,\mbox{ as }\,\,\,\,\mathbf{h}\to 0. \end{equation*}

The motivation for this definition is that it allows us to view the $f\left(\mathbf{a}\right) + \nabla f\left(\mathbf{a}\right)\cdot\mathbf{h}$ as a linear (or rather, affine) approximation to $f\left(\mathbf{a}+\mathbf{h}\right)$.

In your case, we adjust for notation by taking $\overline{x} = \mathbf{a}$ and $\mathbf{h} = x-\overline{x}$, and taking $\nabla f$ to be a row vector rather than a column vector, this becomes

\begin{equation*} f\left(x\right) = f\left(\overline{x}\right) + \nabla f\left(\overline{x}\right)^{t}\left(x-\overline{x}\right) + E\left(x-\overline{x}\right) \end{equation*}

where

\begin{equation*} \frac{E\left(x-\overline{x}\right)}{\lVert{x-\overline{x}\rVert}}\to 0\,\,\,\,\mbox{ as }\,\,\,\,x-\overline{x}\to 0. \end{equation*}

Then you can see that

\begin{equation*} E\left(x-\overline{x}\right) = \lVert{x-\overline{x}\rVert}\beta\left(\overline{x};x\right) \implies \beta\left(\overline{x};x\right) = \frac{E\left(x-\overline{x}\right)}{\lVert{x-\overline{x}\rVert}}, \end{equation*}

so the requirement that $E\left(x-\overline{x}\right)/\lVert{x-\overline{x}\rVert}\to 0$ as $x\to\overline{x}$ (i.e., as $x-\overline{x}\to 0$) is equivalent to $\beta\left(\overline{x};x\right)\to 0$ as $x\to\overline{x}$

Oh, and the semicolon is just used to emphasize that $\beta$ depends on $x$ and on $\overline{x}$--it could just as well be written $\beta\left(\overline{x},x\right)$, or any other way that makes this clear.

Best Answer

Related Solutions

Gradient Definition for Non-Cartesian Coordinates – Multivariable Calculus

Example for Lagrange dual objective does not have an explicit expression

Related Question