[Math] Strictly increasing, strictly convex function: is the second derivative positive

calculusconvex-analysisreal-analysis

Consider a twice continuously differentiable function $f \colon \mathbb{R} \to \mathbb{R}$. While $f''(x)>0\ \forall x$ implies strict convexity of $f$, the converse is not true (e.g. $f(x)=x^4$, strictly convex but $f''(0)=0$).

I was wondering whether the additional requirement of $f$ being strictly increasing can ensure $f''(x)>0$. It would at least rule out the example above.

From how I understand this answer, the requirement is sufficient for (once) differentiable functions, but for twice differentiable ones it says:

$f$ is strictly convex if and only if $f'' \geqslant 0$ everywhere and $f''$ does not vanish on any non-empty open interval $J \subset I$.

Is it correct that this is satisfied under strict increasingness and hence for $f$ as above, strict increasingness and strict convexity together imply $f''(x)>0\ \forall x$?

Best Answer

Take $f(x) = x^4$ on $[0,1]$. It is strictly increasing, strictly convex, but $f''(0) = 0$

Related Solutions

[Math] Convex function with non-symmetric Hessian

It turns out that twice-differentiability implies that the Hessian is symmetric even without convexity and with no reference to whether the second-order partial derivatives are continuous! The proof below is based on Theorem 8.12.2 in the book Foundations of Modern Analysis by Dieudonné (1969, p. 180).

Claim: Let $U\subseteq\mathbb R^n$ be an open set and $f:U\to\mathbb R$ a function. Suppose that $f$ is (Fréchet) differentiable on $U$ and that it is twice (Fréchet) differentiable at $\mathbf x_0\in U$. Then, the Hessian matrix $\mathbf H(\mathbf x_0)$ at $\mathbf x_0$ is symmetric.

Proof: Let $\mathbf D:U\to\mathbb R^n$ denote the gradient function of $f$. Fix $\varepsilon>0$. Since $\mathbf D$ is Fréchet differentiable at $\mathbf x_0$ by assumption, it follows that there exists some $\delta>0$ such that $\|\mathbf v\|<4\delta$ implies that $$\left\|\mathbf D(\mathbf x_0+\mathbf v)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot\mathbf v\right\|\leq\varepsilon\|\mathbf v\|.$$ There is no loss of generality in taking $\delta$ to be so small that the open ball $B(4\delta,\mathbf x_0)$ is contained in the open set $U$.

For any $i,j\in\{1,\ldots,n\}$, let $\mathbf e_i$ and $\mathbf e_j$ be the corresponding standard basis vectors of unit length. Let $\mathbf s\equiv\delta\mathbf e_i$ and $\mathbf t\equiv\delta\mathbf e_j$. It is clear that $\mathbf x_0+\xi\mathbf s+\mathbf t$ and $\mathbf x_0+\xi\mathbf s$ are both in $U$ whenever $\xi\in[0,1]$; this is because $\|\xi\mathbf s+\mathbf t\|<4\delta$ and $\|\xi\mathbf s\|<4\delta$. Define the following function $g:[0,1]\to\mathbb R$: $$g(\xi)\equiv f(\mathbf x_0+\xi\mathbf s+\mathbf t)-f(\mathbf x_0+\xi\mathbf s)\quad\forall\xi\in[0,1].$$

Clearly, $g$ is continuous on $[0,1]$ and differentiable on $(0,1)$. Lagrange's mean-value theorem, in turn, implies that there exists some $\xi\in(0,1)$ such that $$g(1)-g(0)=g'(\xi)=\mathbf s\cdot\left[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0+\xi\mathbf s)\right],$$ using the chain rule.

Next, one can derive the following chain of inequalities (the first one uses the Cauchy–Schwarz inequality): \begin{align*} &\left|g(1)-g(0)-\mathbf s\cdot\mathbf H(\mathbf x_0)\cdot\mathbf t\right|\leq\underbrace{\|\mathbf s\|}_{=\delta}\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)]-\mathbf H(\mathbf x_0)\cdot\mathbf t\right\|\\ =&\,\delta\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s+\mathbf t)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s)]\right\|\\ \leq&\,\delta\varepsilon\left(\|\xi\mathbf s+\mathbf t\|+\|\xi\mathbf s\|\right)<8\delta^2\varepsilon. \end{align*} That is, one has that $$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j|<8\delta^2\varepsilon,$$ and, by a completely analogous and symmetric reasoning in which $\mathbf s$ and $\mathbf t$ are interchanged, $$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_j\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_i|<8\delta^2\varepsilon.$$ Given that $\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j=h_{ij}(\mathbf x_0)\equiv\partial^2 f/(\partial x_i\partial x_j)(\mathbf x_0)$, the preceding two inequalities imply that $$\left|h_{ij}(\mathbf x_0)-h_{ji}(\mathbf x_0)\right|<16\varepsilon.$$ Taking $\varepsilon$ to be arbitrarily small, one sees that $h_{ij}(\mathbf x_0)=h_{ji}(\mathbf x_0)$. $\blacksquare$

[Math] Conditions for Inverse Function Theorem

The condition is exactly that $f'(x)$ is nonzero (or equivalently, positive) for all $x$. Indeed, if $f^{-1}$ is differentiable everywhere, then differentiating the identity $f^{-1}(f(x))=x$ gives $(f^{-1})'(f(x))f'(x)=1$ so $f'(x)$ can never be zero.

Conversely, if $y=f(x)$ and $f'(x)\neq 0$, then $f^{-1}$ is differentiable at $y$ with $(f^{-1})'(y)=1/f'(x)$ (indeed, you can prove this directly from the definition, and the difference quotients to compute $(f^{-1})'(y)$ are the reciprocals of the difference quotients for $f'(x)$). So if $f'$ is always nonzero, then $f^{-1}$ is differentiable everywhere with $(f^{-1})'(y)=1/f'(f^{-1}(y))$. If $f$ is twice continuously differentiable, we can then differentiate $(f^{-1})'$ by the chain rule to find that $f^{-1}$ is twice continuously differentiable.

Note, however, that merely assuming $f$ is differentiable and strictly increasing does not imply $f'$ is positive everywhere. For instance, consider $f(x)=x^3$, which is strictly increasing and infinitely differentiable but $f'(0)=0$ and $f^{-1}(x)=\sqrt[3]{x}$ is not differentiable at $0$.

Best Answer

Related Solutions

[Math] Convex function with non-symmetric Hessian

[Math] Conditions for Inverse Function Theorem

Related Question