Generalize Clairaut-Schwarz theorem to arbitrary order of mixed partial derivatives

multivariable-calculuspartial derivativereal-analysis

After reading the answer here to understand how to apply difference operator, I've figured out how to generalize my proof of Clairaut-Schwarz theorem here to arbitrary order of mixed partial derivatives.

Could you please verify if my proof looks fine or contains logical gaps/errors? Thank you so much for your help!

$\textbf{Generalized Clairaut-Schwarz Theorem:}$ Let $X$ be open in $\mathbb R^n$, $f:X \to F$, and $m \in \mathbb N$. Suppose $j_1, j_2, \ldots, j_m \in\{1,\ldots,n\}$ and $\sigma$ is a permutation of $\{1, \ldots, m\}$. If $\partial_{j_1} \partial_{j_2} \cdots \partial_{j_m} f$ is continuous at $a$ and $\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f$ exists in a neighborhood of $a$, then $$\partial_{j_1} \cdots \partial_{j_m} f (a)= \partial_{j_{\sigma(1)}} \cdots \partial_{j_{\sigma(m)}} f(a)$$

In my proof, I utilize two below lemmas:

Let $\{e_1,\ldots, e_n\}$ be the standard basis of $\mathbb R^n$. For $h \in \mathbb R$ and $j \in \{1,\ldots,n\}$, we define a map $\Delta_j^h f$ by $$\Delta_j^h f: X \to F, \quad x \mapsto f(x+he_j)-f(x)$$

$\textbf{Lemma 1:}$ $$\partial_{j_1} \cdots \partial_{j_m} f (a) = \lim_{h_1 \to 0} \left ( \lim_{h_2 \to 0} \left( \cdots \left ( \lim_{h_m \to 0} \left( \frac{ \Delta_{j_1}^{h_1} \cdots\Delta_{j_m}^{h_m} f (a)}{h_1 \cdots h_m} \right ) \right ) \cdots \right ) \right)$$

$\textbf{Lemma 2:}$ The finite difference operator is commutative, i.e. $$ \Delta_{j_1}^{h_1} \cdots\Delta_{j_m}^{h_m} f (a) = \Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)$$


$\textbf{My attempt:}$

By Mean Value Theorem, we have
$$\begin{aligned} & \quad \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}}\\
=& \quad \frac{\Delta_{j_1}^{h_1} \cdots\Delta_{j_{m}}^{h_{m}} f (a)}{h_1 \cdots h_m} \quad \text{by} \,\, \textbf{Lemma 2} \\
=& \quad \frac{\partial_{j_{1}} \cdots \partial_{j_{m}} f (a + t_1 e_{j_1} + \cdots + t_{m} e_{j_{m}}) h_1 \cdots h_{m}}{h_1 \cdots h_m} \quad \text{by} \,\, \textbf{MVT} \\ =& \quad\partial_{j_{1}} \cdots \partial_{j_{m}} f (a + t_1 e_{j_1} + \cdots + t_{m} e_{j_{m}}) \end{aligned}$$
in which

$$\begin{aligned} \min\{0,h_1\} < t_1 < \max\{0,h_1\} \\ \vdots\quad\quad\quad\quad\quad\quad \,\,\, \\ \min\{0,h_m\} < t_1 < \max\{0,h_m\} \end{aligned}$$

Hence

$$\begin{aligned} & \quad \left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \|\\
=& \quad \left \| \partial_{j_{1}} \cdots \partial_{j_{m}} f (a + t_1 e_{j_1} + \cdots + t_{m} e_{j_{m}}) – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \|\end{aligned}$$

Let $t = |t_1| + \cdots+|t_{m}|$ and $h= |h_1| + \cdots+|h_{m}|$. It follows from the continuity of $\partial_{j_1} \partial_{j_2} \cdots \partial_{j_m} f$ at $a$ that for all $\delta > 0$ there is $\epsilon > 0$ such that $$\left \| \partial_{j_{1}} \cdots \partial_{j_{m}} f (a + t_1 e_{j_1} + \cdots + t_{m} e_{j_{m}}) – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| <\ \delta$$ for all $t < \epsilon$. As such, for all $h <\ \epsilon$, we have $$ \left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| < \delta$$

Take the limit $h_{\sigma(m)} \to 0$, we have $$\lim_{h_{\sigma(m)} \to 0} \left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$

and consequently $$ \left \| \lim_{h_{\sigma(m)} \to 0} \left (\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} \right ) – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$

and consequently $$\left \| \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m-1)}}^{h_{\sigma(m-1)}} \partial_{j_{m}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m-1)}} – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta \quad \text{by} \,\, \textbf{Lemma 1}$$

Iterating this process of taking limit, we get $$\left \| \lim_{h_{\sigma(1)} \to 0} \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \left ( \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right) (a)}{h_{\sigma(1)}} – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$ or equivalently $$\left \| \lim_{h_{\sigma(1)} \to 0} \frac{ \left ( \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right) (a + h_{\sigma(1)} e_{\sigma(1)}) – \left (\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right)(a)}{h_{\sigma(1)}} – \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$

For all $\delta >0$, there is $\epsilon >0$ such that for all $|h_{\sigma(1)}| < \epsilon$, the last inequality holds. It follows that $$\partial_{j_{\sigma(1)}}\left (\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right)(a) = \partial_{j_{1}} \cdots \partial_{j_{m}} f (a)$$ and consequently $$\partial_{j_{\sigma(1)}} \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f (a) = \partial_{j_{1}} \cdots \partial_{j_{m}} f (a)$$

This completes the proof.

Best Answer

Thanks to @Pietro for pointing out my fatal misunderstanding of MVT vector-valued function. I've figured a fixed by use the integral form of MVT.


$\textbf{My updated proof:}$

By $\textbf{Lemma 2}$, we have $$ \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} = \frac{\Delta_{j_1}^{h_1} \cdots\Delta_{j_{m}}^{h_{m}} f (a)}{h_1 \cdots h_m}$$

By the integral form of Mean Value Theorem for vector-valued function, we have $$\begin{aligned} & \quad \Delta_{j_1}^{h_1} \cdots \Delta_{j_m}^{h_m} f (a) \\ =& \quad \Delta_{j_1}^{h_1} \cdots \Delta_{j_{m-1}}^{h_{m-1}} \Delta_{j_{m}}^{h_{m}} f (a) \\ =& \quad \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a+ h_{m} e_{j_{m}}) - \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a) \\ = & \quad \int_0^1 \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a+ t_m h_m e_{j_{m}})h_{m} \, \mathrm{d} t_m \\ \end{aligned}$$

Similarly, $$\begin{aligned} \quad & \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a+ t_m h_m e_{j_{m}}) \\ =\quad & \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-2}}^{h_{m-2}} f \left (a+ t_m h_m e_{j_{m}} + h_{m-1} e_{j_{m-1}} \right)\\& \quad \quad \quad \quad - \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-2}}^{h_{m-2}} f \left (a+ t_m h_m e_{j_{m}} \right) \\ = \quad & \int_0^1 \partial_{j_{m}} \partial_{j_{m-1}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-2}}^{h_{m-2}} f \left (a+ t_m h_m e_{j_{m}} + t_{m-1} h_{m-1} e_{j_{m-1}}\right ) h_{m-1} \, \mathrm{d} t_{m-1} \\ \end{aligned}$$

Iterating the use of the integral form of Mean Value Theorem for vector-valued function, we get $$\Delta_{j_1}^{h_1} \cdots \Delta_{j_m}^{h_m} f (a) = {\int_0^1 \cdots \int_0^1} \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) h_{1} \cdots h_{m} \, \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m}$$ and consequently $$ \frac{\Delta_{j_1}^{h_1} \cdots\Delta_{j_{m}}^{h_{m}} f (a)}{h_1 \cdots h_m} = {\int_0^1 \cdots \int_0^1} \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right )\, \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m}$$ and consequently $$\begin{aligned} &\left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \|\\ = \quad & \left \| {\int_0^1 \cdots \int_0^1} \Big ( \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big) \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m} \right \| \\ \le \quad & {\int_0^1 \cdots \int_0^1} \Big \| \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big \| \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m} \\ \end{aligned}$$

Let $h= |h_1| + \cdots+|h_{m}|$. It follows from the continuity of $\partial_{j_1} \partial_{j_2} \cdots \partial_{j_m} f$ at $a$ that for all $\delta > 0$ and $(t_1,\ldots,t_m) \in [0,1]^m$ there is $\epsilon > 0$ such that $$\Big \| \partial_{j_{1}} \cdots \partial_{j_{m}} f (a + t_1 h_1 e_{j_1} + \cdots + t_{m} h_m e_{j_{m}}) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big \| <\ \delta$$ for all $h < \epsilon$. As such, for all $h <\ \epsilon$, we have $$ {\int_0^1 \cdots \int_0^1} \Big \| \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big \| \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m} < \delta$$ and consequently $$\left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| < \delta$$

Take the limit $h_{\sigma(m)} \to 0$, we have $$\lim_{h_{\sigma(m)} \to 0} \left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$

and consequently $$ \left \| \lim_{h_{\sigma(m)} \to 0} \left (\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} \right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$

and consequently $$\left \| \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m-1)}}^{h_{\sigma(m-1)}} \partial_{j_{m}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m-1)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta \quad \text{by} \,\, \textbf{Lemma 1}$$

Iterating this process of taking limit, we get $$\left \| \lim_{h_{\sigma(1)} \to 0} \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \left ( \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right) (a)}{h_{\sigma(1)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$ or equivalently $$\left \| \lim_{h_{\sigma(1)} \to 0} \frac{ \left ( \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right) (a + h_{\sigma(1)} e_{j_{\sigma(1)}}) - \left (\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right)(a)}{h_{\sigma(1)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$

For all $\delta >0$, there is $\epsilon >0$ such that for all $|h_{\sigma(1)}| < \epsilon$, the last inequality holds. It follows that $$\partial_{j_{\sigma(1)}}\left (\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right)(a) = \partial_{j_{1}} \cdots \partial_{j_{m}} f (a)$$ and consequently $$\partial_{j_{\sigma(1)}} \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f (a) = \partial_{j_{1}} \cdots \partial_{j_{m}} f (a)$$

This completes the proof.