Prove that if $\partial^2 f(a)$ exists then it is a symmetric bilinear map

derivativesproof-verificationproof-writingreal-analysis

I'm trying to prove the symmetry of second derivative (if exists).

Let $E$ and $F$ be Banach spaces, and $f:E \to F$ such that $\partial^2 f(a)$ exists. Prove that $\partial^2 f(a)$ is a symmetric bilinear map.

Actually, my textbook use the stronger assumption that $\partial^2 f$ is continuous at $a$. But the proof in this note note does not require the continuity $\partial^2 f$ at $a$. Unfortunately, that proof uses the Taylor expansion for which my textbook leaves until the next chapter.

In my below attempt, I use the ideas in that note, but I don't use Taylor expansion. Because I want to check if my understanding is correct, I write in details and thus the my proof is quite long. I'm sorry about that.

Could you please verify whether my attempt on this well-known result is fine or contains logical gaps/errors? Any suggestion is greatly appreciated!


My attempt:

For $a,h,k \in E$, consider the maps $$F:\mathbb R \to F, \quad t \mapsto f(a+t h+t k)-f(a+t h)-f(a+t k)+f(a)$$ and $$g:\mathbb R \to F, \quad s\mapsto f(a+s h+t k)-f(a+s h)$$

Because $\partial^2 f(a)$ exists, there is a neighborhood $\mathcal U_1$ of $a$ such that $\partial f(x)$ exists for all $x \in \mathcal U_1$. Hence $f$ is differentiable on $\mathcal U_1$. It follows that there is a neighborhood $\mathcal U_2$ of $0$ such that $g$ is differentiable on $\mathcal U_2$.

By Mean Value Theorem, we have $F(t) = g(t)-g(0) = \partial g(\theta) (t)$ for some $\theta$ between $0$ and $t$. By the chain rule, we have $$\partial g(\theta):\mathbb R \to F, \quad l \to \partial f(a+\theta h +t k) (hl) – \partial f(a+\theta h)(hl)$$

Hence $$\begin{aligned} \partial g(\theta) (t) &= \partial f(a+\theta h +t k) (ht) – \partial f(a+\theta h)(ht)\\ &= t \partial f(a+\theta h +t k) (h) – t \partial f(a+\theta h)(h)
\end{aligned}$$

Because $\partial f$ is differentiable at $a$, we have $$\partial f(x) = \partial f(a) + \partial^2f(a)(x-a) + \|x-a\| \cdot r(x) \quad \text{for all} \quad x \in E$$ where $r:E \to \mathcal L(E,F)$ is continuous at $a$ and $r(a)=0$.

It follows that $$\begin{aligned} \partial f(a+\theta h +t k) &= \partial f(a) + \partial^2f(a)(\theta h +t k) + \|\theta h +t k\| \cdot r (a+\theta h +t k) \\ \partial f(a+\theta h) &= \partial f(a) + \partial^2f(a)(\theta h) + \|\theta h\| \cdot r(a+\theta h) \end{aligned}$$

and consequently $$\begin{aligned} F(t) &= t \left [ \partial^2f(a)(\theta h +t k) – \partial^2f(a)(\theta h)\right ] (h) \\ & \quad\quad + t \|\theta h +t k\| \cdot r (a+\theta h +t k) (h) – t \|\theta h\| \cdot r(a+\theta h) (h) \\ &= t \partial^2f(a)(t k) (h) + t \|\theta h +t k\| \cdot r (a+\theta h +t k) (h) – t \|\theta h\| \cdot r(a+\theta h) (h) \\ &= t^2 \partial^2f(a)(k) (h) +t \|\theta h +t k\| \cdot r (a+\theta h +t k) (h) – t \|\theta h\| \cdot r(a+\theta h) (h) \end{aligned}$$

Let $M = \|h\|+\|k\|$. We have $$\begin{aligned} \big\|\|\theta h +t k\| \cdot r (a+\theta h +t k) (h)\big\| &= \|\theta h +t k\| \cdot \|r (a+\theta h +t k) (h)\| \\ &\le (|\theta| \cdot \|h\|+ |t| \cdot \|k\|) \cdot \|r (a+\theta h +t k)\| \cdot \|h\| \\ &\le (M|\theta|+ M|t|) \cdot \|r (a+\theta h +t k)\| \cdot M \\ &\le (M|t|+ M|t|) \cdot \|r (a+\theta h +t k)\| \cdot M \\ &= 2M^2|t| \cdot \|r (a+\theta h +t k)\| \end{aligned}$$

It follows that $$\begin{aligned} \lim_{t \to 0} \left \|\frac{\|\theta h +t k\| \cdot r (a+\theta h +t k) (h)}{t} \right \| &= \lim_{t \to 0} \frac{\big \| \|\theta h +t k\| \cdot r (a+\theta h +t k) (h) \big \|}{|t|} \\ &\le \lim_{t \to 0} \frac{2M^2|t| \cdot \|r (a+\theta h +t k)\|}{|t|} \\ &= \lim_{t \to 0} 2M^2 \|r (a+\theta h +t k)\| \end{aligned}$$

It follows from $\theta$ is between $0$ and $t$ that $\theta \to 0$ as $t \to 0$. Thus $a+\theta h +t k \to a$ as $t \to 0$. Moreover, $r$ is continuous at $a$ and $r(a)=0$. Hence $\lim_{t \to 0} 2M^2 \|r (a+\theta h +t k)\| = 2M^2 \lim_{t \to 0}\|r (a+\theta h +t k)\| = 0$. It follows that $$\lim_{t \to 0} \left \|\frac{\|\theta h +t k\| \cdot r (a+\theta h +t k) (h)}{t} \right \| = 0$$ and consequently $$\lim_{t \to 0} \frac{\|\theta h +t k\| \cdot r (a+\theta h +t k) (h)}{t}$$

With similar reasoning, we get $$\lim_{t \to 0} \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} =0$$

As such, we have $$\begin{aligned} &\lim_{t \to 0} \frac{F(t)}{t^2} \\ = &\lim_{t \to 0} \frac{t^2 \partial^2f(a)(k) (h) +t \|\theta h +t k\| \cdot r (a+\theta h +t k) (h) – t \|\theta h\| \cdot r(a+\theta h) (h)}{t^2} \\ = &\lim_{t \to 0} \left (\partial^2f(a)(k) (h)+ \frac{\|\theta h +t k\| \cdot r (a+\theta h +t k) (h)}{t} – \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \right ) \\ = & \partial^2f(a)(k) (h)\end{aligned}$$

Consider $\bar g:\mathbb R \to F, \quad s\mapsto f(a+t h+s k)-f(a+s k)$. With similar reasoning as above, we get $$\lim_{t \to 0} \frac{F(t)}{t^2} = \partial^2f(a)(h) (k)$$ As such, $ \partial^2f(a)(k) (h) = \partial^2f(a)(h) (k)$.

Best Answer

I've just figured out that $F$ is a Banach space, which is not necessarily $\mathbb R$. As such, I must use the integral form of MVT. Below is my fix.


My updated proof:

For $a,h,k \in E$, consider the maps $$F:\mathbb R \to F, \quad t \mapsto f(a+t h+t k)-f(a+t h)-f(a+t k)+f(a)$$ and $$g:\mathbb R \to F, \quad s\mapsto f(a+s h+t k)-f(a+s h)$$

Because $\partial^2 f(a)$ exists, there is a neighborhood $\mathcal U_1$ of $a$ such that $\partial f(x)$ exists for all $x \in \mathcal U_1$. Hence $f$ is differentiable on $\mathcal U_1$. It follows that there is a neighborhood $\mathcal U_2$ of $0$ such that $g$ is differentiable on $\mathcal U_2$.

By Mean Value Theorem for vector-valued function, we have $$F(t) = g(t)-g(0) = \int_0^1\partial g(\theta t) (t) \, \mathrm{d} \theta$$ By the chain rule, we have $$\partial g(\theta):\mathbb R \to F, \quad l \to \partial f(a+\theta h +t k) (hl) - \partial f(a+\theta h)(hl)$$

Hence $$\begin{aligned} \partial g(\theta t) (t) &= \partial f(a+\theta th +t k) (ht) - \partial f(a+\theta th)(ht)\\ &= t \partial f(a+\theta th +t k) (h) - t \partial f(a+\theta th)(h) \\ &= t \Big ( \partial f(a+\theta th +t k) - \partial f(a+\theta th) \Big )(h) \end{aligned}$$

Because $\partial f$ is differentiable at $a$, we have $$\partial f(x) = \partial f(a) + \partial^2f(a)(x-a) + \|x-a\| \cdot r(x) \quad \text{for all} \quad x \in E$$ where $r:E \to \mathcal L(E,F)$ is continuous at $a$ and $r(a)=0$.

It follows that $$\begin{aligned} \partial f(a+\theta th +t k) &= \partial f(a) + \partial^2f(a)(\theta th +t k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) \\ \partial f(a+\theta th) &= \partial f(a) + \partial^2f(a)(\theta th) + \|\theta th\| \cdot r(a+\theta th) \end{aligned}$$

and consequently $$\begin{aligned} &\partial f (a+\theta th +t k) - \partial f (a+\theta th)\\ = \quad & \Big( \partial^2 f (a) (\theta th +t k) - \partial^2 f (a) (\theta th) \Big ) \\ & \quad \quad + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \\ = \quad & \partial^2f(a)(t k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \\ = \quad & t \partial^2 f (a)(k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \end{aligned}$$

and consequently $$\begin{aligned} &F(t)\\ = \quad &\int_0^1 t \Big ( t \partial^2 f (a)(k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \Big )(h) \, \mathrm{d} \theta \\ = \quad & t^2 \partial^2 f (a)(k) (h) + t \int_0^1 \|\theta th +t k\| \cdot r (a+\theta th +t k) (h) \, \mathrm{d} \theta - t \int_0^1 \|\theta th\| \cdot r(a+\theta th) (h) \, \mathrm{d} \theta \end{aligned}$$

Let $M = \|h\|+\|k\|$. For all $\theta \in [0,1]$, we have $$\begin{aligned} \big \| \| \theta th +t k\| \cdot r (a+\theta th +t k) (h)\big\| &= \|\theta th +t k\| \cdot \|r (a+\theta th +t k) (h)\| \\ &\le (|\theta t| \cdot \|h\|+ |t| \cdot \|k\|) \cdot \|r (a+\theta th +t k)\| \cdot \|h\| \\ &\le (|t| \cdot \|h\|+ |t| \cdot \|k\|) \cdot \|r (a+\theta th +t k)\| \cdot \|h\| \\ &\le (M|t|+ M|t|) \cdot \|r (a+\theta h +t k)\| \cdot M \\ &= 2M^2|t| \cdot \|r (a+\theta h +t k)\| \end{aligned}$$

It follows that $$\begin{aligned} \lim_{t \to 0} \left \|\frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \right \| &= \lim_{t \to 0} \frac{\big \| \|\theta th +t k\| \cdot r (a+\theta th +t k) (h) \big \|}{|t|} \\ &\le \lim_{t \to 0} \frac{2M^2|t| \cdot \|r (a+\theta th +t k)\|}{|t|} \\ &= \lim_{t \to 0} 2M^2 \|r (a+\theta th +t k)\| \end{aligned}$$

For all $\theta \in [0,1]$, we have $\theta t \to 0$ as $t \to 0$. Thus $a+\theta th +t k \to a$ as $t \to 0$. Moreover, $r$ is continuous at $a$ and $r(a)=0$. Hence $$\lim_{t \to 0} 2M^2 \|r (a+\theta th +t k)\| = 2M^2 \lim_{t \to 0}\|r (a+\theta th +t k)\| = 0$$ It follows that $$\lim_{t \to 0} \left \|\frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \right \| = 0$$ and consequently $$\lim_{t \to 0} \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} = 0$$

It follows that for all $\delta >0$ there is $\epsilon >0$ such that $$\forall |t|<\epsilon, \forall \theta \in [0,1]: \left \| \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t}\right \| <\delta$$ and consequently $$\forall |t|<\epsilon: \int_0^1 \left \| \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t}\right \| \mathrm{d} \theta <\delta$$

On the other hand, $$\left \|\int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta \right \| \le \int_0^1 \left \| \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \right \| \mathrm{d} \theta $$

Hence $$\forall |t|<\epsilon: \left \| \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \mathrm{d} \theta \right \| <\delta$$ and consequently $$ \lim_{t \to 0} \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta = 0$$

With similar reasoning, we get $$\lim_{t \to 0} \int_0^1 \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \, \mathrm{d} \theta =0$$

As such, we have $$\begin{aligned} &\lim_{t \to 0} \frac{F(t)}{t^2} \\ = &\lim_{t \to 0} \frac{t^2 \partial^2 f (a)(k) (h) + t \int_0^1 \|\theta th +t k\| \cdot r (a+\theta th +t k) (h) \, \mathrm{d} \theta - t \int_0^1 \|\theta th\| \cdot r(a+\theta th) (h) \, \mathrm{d} \theta}{t^2} \\ = &\lim_{t \to 0} \left (\partial^2f(a)(k) (h)+ \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta - \int_0^1 \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \, \mathrm{d} \theta \right ) \\ = &\lim_{t \to 0} \partial^2f(a)(k) (h)+ \lim_{t \to 0} \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta - \lim_{t \to 0} \int_0^1 \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \, \mathrm{d} \theta \\ = & \partial^2f(a)(k) (h)\end{aligned}$$

Consider $\bar g:\mathbb R \to F, \quad s\mapsto f(a+t h+s k)-f(a+s k)$. With similar reasoning as above, we get $$\lim_{t \to 0} \frac{F(t)}{t^2} = \partial^2f(a)(h) (k)$$ As such, $ \partial^2f(a)(k) (h) = \partial^2f(a)(h) (k)$.

Related Question