Graph of a convex function and normal vectors

calculusconvex-analysisconvex-geometryreal-analysis

Recall that the graph of a function $f:\mathbb R^n\to\mathbb R$ is
$$
G(f)=\{ (x,y)\in\mathbb R^n\times \mathbb R \mid y =f(x)\}
$$

Suppose $f$ is a piece-wise smooth convex function.

Fix two points on the graph $P,Q\in G(f)$. We pick up two inner normal vectors $\overrightarrow N_P$ and $\overrightarrow N_Q$ at $P$ and $Q$ contained in the subset $\mathbb R^n\times \{1\}\subset \mathbb R^{n+1}$.

The normal vectors mean that $\overrightarrow N_P$ or $\overrightarrow N_Q$ is orthogonal to the tangent vector at $P$ or $Q$)

Question: Do we always have $\langle \overrightarrow N_Q -\overrightarrow{N_P}, \overrightarrow{PQ}\rangle <0$?

I ask this because of the following example. If $f(x)=x^2$ and $P=(x_0,x_0^2), Q=(x_1,x_1^2)$, then
$\overrightarrow N_P=(-2x_0,1)$ and $\overrightarrow N_Q=(-2x_1,1)$. Also, we have $\overrightarrow N_Q -\overrightarrow{N_P}=(-2(x_1-x_0),0)$ and $\overrightarrow{PQ}=(x_1-x_0, x_1^2-x_0^2)$, implying that $\langle \overrightarrow N_Q -\overrightarrow{N_P}, \overrightarrow{PQ}\rangle=-2(x_1-x_0)^2<0$

Best Answer

First let us take the case $n=1$, i.e., $f\colon\mathbb{R}\to\mathbb{R}$.

Up to some stretching and translating, let us say that $P=(0,0)$, $Q=(1,f(1))$, $N_P=(-f'(0),1)$, $N_Q=(-f'(1),1)$. Then $$N_Q-N_P=(f'(0)-f'(1),0)$$ and $$Q-P=Q=(1,f(1))$$ so $$\langle N_Q-N_P,Q-P\rangle=f'(0)-f'(1)$$

Now you should know that the derivative of a convex function is non-decreasing (see details below, as I could not find the proof of this here at MSE), so $f'(0)-f'(1)\leq 0$.


For general $f\colon\mathbb{R}\to\mathbb{R}$, a similar argument will yield the result. Or you can "stretch and translate", as in the beginning of the solution: If $P=(x_0,f(x_0))$ and $Q=(x_1,f(x_1))$, apply the case above to $g(x)=f((x_1-x_0)x+x_0)-f(x_0)$ and rewrite the normals of $f$ in terms of the normals of $g$.


As for the fact I mentioned about derivatives of convex functions, we need the following general fact (see Prob. 23, Chap. 4 in Baby Rudin: Every convex function is continuous and every increasing convex function of a convex function is convex):

if $f$ is convex on an interval containing $x_1<x_2<x_3$, then $$\frac{f(x_2)-f(x_1)}{x_2-x_1}\leq\frac{f(x_3)-f(x_1)}{x_3-x_1}\leq\frac{f(x_3)-f(x_2)}{x_3-x_2}$$ (i.e., the slope of the secans increase when you move them to the right)

Then given $x_0<x_1$, we have, for all $x_0<z<w<x_1$, \begin{align*} \frac{f(z)-f(x_0)}{z-x_0} &\leq\frac{f(x_1)-f(x_0)}{x_1-x_0}\\ &\leq\frac{f(x_1)-f(w)}{x_1-w}\\ &=\frac{f(w)-f(x_1)}{w-x_1} \end{align*} Letting $z\to x_0^+$ and $w\to x_1^-$, we conclude $$f'(x_0)\leq f'(x_1)$$


For general $n$, let $f\colon\mathbb{R}^n\to\mathbb{R}$ convex. The tangent space at a point $(x_0,f(x_0))$ of the graph is given by vectors of the form $(x,\nabla f(x_0)\cdot x)$ (where $\nabla f$ is the gradient of $f$ and "$\cdot$" denotes the usual inner product). It follows that the inner normal vectors are given by $(-\nabla f(x_0),1)$.

Let $P=(x_0,f(x_0))$ and $Q=(x_1,f(x_1))$ be given, with $x_0,x_1\in\mathbb{R}^n$. Then $$\langle N_Q-N_P,Q-P\rangle=(\nabla f(x_1)-\nabla f(x_0))\cdot (x_0-x_1)\tag{$\bigstar$}$$

The following argument is a standard way of transforming a multivariable problem into a single-variable problem: Restrict your function to a line!

Consider the function $g\colon\mathbb{R}\to\mathbb{R}$ given by $g(\lambda)=f((1-\lambda)x_0+\lambda x_1)$. Then $g$ is convex. The single variable case yields $g'(1)\geq g'(0)$.

Let us use the chain rule to calculate $g'$ in terms of $\nabla f$ (I can never remember the formulas, so skip this if you want). I'll denote the derivative of a function $F\colon\mathbb{R}^N\to\mathbb{R}^M$ at a point $x_0$ by $dF_{x_0}$.

Let $h\colon\mathbb{R}\to\mathbb{R}^n$, $h(\lambda)=(1-\lambda)x_0+\lambda x_1=x_0+\lambda(x_1-x_0)$. Then

  • $dh_\lambda(t)=t(x_1-x_0)$ for any $\lambda$
  • $df_{x_0}(x)=\nabla f(x_0)\cdot x$.
  • $dg_\lambda(t)=d(f\circ h)_\lambda(t)=df_{h(\lambda)}(dh_{\lambda}(t))=\nabla f(h(\lambda))\cdot(x_1-x_0)$
  • $g'(\lambda)=dg_\lambda(1)$

So in particular $$g'(1)=\nabla f(h(1))\cdot(x_1-x_0)=\nabla f(x_1)\cdot(x_1-x_0)$$ and similarly $$g'(0)=\nabla f(x_0)\cdot (x_1-x_0)$$

Using these equalities along with $g'(0)\leq g'(1)$ and equation ($\bigstar$), we conclude that $\langle N_Q-N_P,Q-P\rangle\leq 0$, as we wanted.

Related Question