I've just figured out that $F$ is a Banach space, which is not necessarily $\mathbb R$. As such, I must use the integral form of MVT. Below is my fix.
My updated proof:
For $a,h,k \in E$, consider the maps $$F:\mathbb R \to F, \quad t \mapsto f(a+t h+t k)-f(a+t h)-f(a+t k)+f(a)$$ and $$g:\mathbb R \to F, \quad s\mapsto f(a+s h+t k)-f(a+s h)$$
Because $\partial^2 f(a)$ exists, there is a neighborhood $\mathcal U_1$ of $a$ such that $\partial f(x)$ exists for all $x \in \mathcal U_1$. Hence $f$ is differentiable on $\mathcal U_1$. It follows that there is a neighborhood $\mathcal U_2$ of $0$ such that $g$ is differentiable on $\mathcal U_2$.
By Mean Value Theorem for vector-valued function, we have $$F(t) = g(t)-g(0) = \int_0^1\partial g(\theta t) (t) \, \mathrm{d} \theta$$ By the chain rule, we have $$\partial g(\theta):\mathbb R \to F, \quad l \to \partial f(a+\theta h +t k) (hl) - \partial f(a+\theta h)(hl)$$
Hence $$\begin{aligned} \partial g(\theta t) (t) &= \partial f(a+\theta th +t k) (ht) - \partial f(a+\theta th)(ht)\\ &= t \partial f(a+\theta th +t k) (h) - t \partial f(a+\theta th)(h) \\
&= t \Big ( \partial f(a+\theta th +t k) - \partial f(a+\theta th) \Big )(h)
\end{aligned}$$
Because $\partial f$ is differentiable at $a$, we have $$\partial f(x) = \partial f(a) + \partial^2f(a)(x-a) + \|x-a\| \cdot r(x) \quad \text{for all} \quad x \in E$$ where $r:E \to \mathcal L(E,F)$ is continuous at $a$ and $r(a)=0$.
It follows that $$\begin{aligned} \partial f(a+\theta th +t k) &= \partial f(a) + \partial^2f(a)(\theta th +t k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) \\ \partial f(a+\theta th) &= \partial f(a) + \partial^2f(a)(\theta th) + \|\theta th\| \cdot r(a+\theta th) \end{aligned}$$
and consequently $$\begin{aligned} &\partial f (a+\theta th +t k) - \partial f (a+\theta th)\\
= \quad & \Big( \partial^2 f (a) (\theta th +t k) - \partial^2 f (a) (\theta th) \Big ) \\
& \quad \quad + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \\
= \quad & \partial^2f(a)(t k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \\
= \quad & t \partial^2 f (a)(k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \end{aligned}$$
and consequently $$\begin{aligned} &F(t)\\
= \quad &\int_0^1 t \Big ( t \partial^2 f (a)(k) + \|\theta th +t k\| \cdot r (a+\theta th +t k) - \|\theta th\| \cdot r(a+\theta th) \Big )(h) \, \mathrm{d} \theta \\
= \quad & t^2 \partial^2 f (a)(k) (h) + t \int_0^1 \|\theta th +t k\| \cdot r (a+\theta th +t k) (h) \, \mathrm{d} \theta - t \int_0^1 \|\theta th\| \cdot r(a+\theta th) (h) \, \mathrm{d} \theta \end{aligned}$$
Let $M = \|h\|+\|k\|$. For all $\theta \in [0,1]$, we have $$\begin{aligned} \big \| \| \theta th +t k\| \cdot r (a+\theta th +t k) (h)\big\| &= \|\theta th +t k\| \cdot \|r (a+\theta th +t k) (h)\| \\
&\le (|\theta t| \cdot \|h\|+ |t| \cdot \|k\|) \cdot \|r (a+\theta th +t k)\| \cdot \|h\| \\
&\le (|t| \cdot \|h\|+ |t| \cdot \|k\|) \cdot \|r (a+\theta th +t k)\| \cdot \|h\| \\
&\le (M|t|+ M|t|) \cdot \|r (a+\theta h +t k)\| \cdot M \\
&= 2M^2|t| \cdot \|r (a+\theta h +t k)\| \end{aligned}$$
It follows that $$\begin{aligned} \lim_{t \to 0} \left \|\frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \right \| &= \lim_{t \to 0} \frac{\big \| \|\theta th +t k\| \cdot r (a+\theta th +t k) (h) \big \|}{|t|} \\ &\le \lim_{t \to 0} \frac{2M^2|t| \cdot \|r (a+\theta th +t k)\|}{|t|} \\ &= \lim_{t \to 0} 2M^2 \|r (a+\theta th +t k)\| \end{aligned}$$
For all $\theta \in [0,1]$, we have $\theta t \to 0$ as $t \to 0$. Thus $a+\theta th +t k \to a$ as $t \to 0$. Moreover, $r$ is continuous at $a$ and $r(a)=0$. Hence $$\lim_{t \to 0} 2M^2 \|r (a+\theta th +t k)\| = 2M^2 \lim_{t \to 0}\|r (a+\theta th +t k)\| = 0$$ It follows that $$\lim_{t \to 0} \left \|\frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \right \| = 0$$ and consequently $$\lim_{t \to 0} \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} = 0$$
It follows that for all $\delta >0$ there is $\epsilon >0$ such that $$\forall |t|<\epsilon, \forall \theta \in [0,1]: \left \| \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t}\right \| <\delta$$ and consequently $$\forall |t|<\epsilon: \int_0^1 \left \| \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t}\right \| \mathrm{d} \theta <\delta$$
On the other hand, $$\left \|\int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta \right \| \le \int_0^1 \left \| \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \right \| \mathrm{d} \theta $$
Hence $$\forall |t|<\epsilon: \left \| \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \mathrm{d} \theta \right \| <\delta$$ and consequently $$ \lim_{t \to 0} \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta = 0$$
With similar reasoning, we get $$\lim_{t \to 0} \int_0^1 \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \, \mathrm{d} \theta =0$$
As such, we have $$\begin{aligned} &\lim_{t \to 0} \frac{F(t)}{t^2} \\
= &\lim_{t \to 0} \frac{t^2 \partial^2 f (a)(k) (h) + t \int_0^1 \|\theta th +t k\| \cdot r (a+\theta th +t k) (h) \, \mathrm{d} \theta - t \int_0^1 \|\theta th\| \cdot r(a+\theta th) (h) \, \mathrm{d} \theta}{t^2} \\
= &\lim_{t \to 0} \left (\partial^2f(a)(k) (h)+ \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta - \int_0^1 \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \, \mathrm{d} \theta \right ) \\
= &\lim_{t \to 0} \partial^2f(a)(k) (h)+ \lim_{t \to 0} \int_0^1 \frac{\|\theta th +t k\| \cdot r (a+\theta th +t k) (h)}{t} \, \mathrm{d} \theta - \lim_{t \to 0} \int_0^1 \frac{ \|\theta h\| \cdot r(a+\theta h) (h)}{t} \, \mathrm{d} \theta \\
= & \partial^2f(a)(k) (h)\end{aligned}$$
Consider $\bar g:\mathbb R \to F, \quad s\mapsto f(a+t h+s k)-f(a+s k)$. With similar reasoning as above, we get $$\lim_{t \to 0} \frac{F(t)}{t^2} = \partial^2f(a)(h) (k)$$ As such, $ \partial^2f(a)(k) (h) = \partial^2f(a)(h) (k)$.
Thanks to @Pietro for pointing out my fatal misunderstanding of MVT vector-valued function. I've figured a fixed by use the integral form of MVT.
$\textbf{My updated proof:}$
By $\textbf{Lemma 2}$, we have $$ \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}}
= \frac{\Delta_{j_1}^{h_1} \cdots\Delta_{j_{m}}^{h_{m}} f (a)}{h_1 \cdots h_m}$$
By the integral form of Mean Value Theorem for vector-valued function, we have $$\begin{aligned} & \quad \Delta_{j_1}^{h_1} \cdots \Delta_{j_m}^{h_m} f (a) \\
=& \quad \Delta_{j_1}^{h_1} \cdots \Delta_{j_{m-1}}^{h_{m-1}} \Delta_{j_{m}}^{h_{m}} f (a) \\
=& \quad \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a+ h_{m} e_{j_{m}}) - \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a) \\
= & \quad \int_0^1 \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a+ t_m h_m e_{j_{m}})h_{m} \, \mathrm{d} t_m \\ \end{aligned}$$
Similarly, $$\begin{aligned} \quad & \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-1}}^{h_{m-1}} f(a+ t_m h_m e_{j_{m}}) \\
=\quad & \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-2}}^{h_{m-2}} f \left (a+ t_m h_m e_{j_{m}} + h_{m-1} e_{j_{m-1}} \right)\\& \quad \quad \quad \quad - \partial_{j_{m}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-2}}^{h_{m-2}} f \left (a+ t_m h_m e_{j_{m}} \right) \\
= \quad & \int_0^1 \partial_{j_{m}} \partial_{j_{m-1}} \Delta_{j_{1}}^{h_{1}} \cdots \Delta_{j_{m-2}}^{h_{m-2}} f \left (a+ t_m h_m e_{j_{m}} + t_{m-1} h_{m-1} e_{j_{m-1}}\right ) h_{m-1} \, \mathrm{d} t_{m-1} \\ \end{aligned}$$
Iterating the use of the integral form of Mean Value Theorem for vector-valued function, we get $$\Delta_{j_1}^{h_1} \cdots \Delta_{j_m}^{h_m} f (a) = {\int_0^1 \cdots \int_0^1} \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) h_{1} \cdots h_{m} \, \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m}$$ and consequently $$ \frac{\Delta_{j_1}^{h_1} \cdots\Delta_{j_{m}}^{h_{m}} f (a)}{h_1 \cdots h_m} = {\int_0^1 \cdots \int_0^1} \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right )\, \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m}$$ and consequently $$\begin{aligned} &\left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \|\\
= \quad & \left \| {\int_0^1 \cdots \int_0^1} \Big ( \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big) \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m} \right \| \\
\le \quad & {\int_0^1 \cdots \int_0^1} \Big \| \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big \| \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m} \\
\end{aligned}$$
Let $h= |h_1| + \cdots+|h_{m}|$. It follows from the continuity of $\partial_{j_1} \partial_{j_2} \cdots \partial_{j_m} f$ at $a$ that for all $\delta > 0$ and $(t_1,\ldots,t_m) \in [0,1]^m$ there is $\epsilon > 0$ such that $$\Big \| \partial_{j_{1}} \cdots \partial_{j_{m}} f (a + t_1 h_1 e_{j_1} + \cdots + t_{m} h_m e_{j_{m}}) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big \| <\ \delta$$ for all $h < \epsilon$. As such, for all $h <\ \epsilon$, we have $$ {\int_0^1 \cdots \int_0^1} \Big \| \partial_{j_{1}} \cdots \partial_{j_{m}} f \left (a+ t_1 h_1 e_{j_{1}} + \cdots+ t_{m} h_m e_{j_{m}}\right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \Big \| \mathrm{d} t_{1} \cdots \, \mathrm{d} t_{m} < \delta$$ and consequently $$\left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| < \delta$$
Take the limit $h_{\sigma(m)} \to 0$, we have $$\lim_{h_{\sigma(m)} \to 0} \left \|\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$
and consequently $$ \left \| \lim_{h_{\sigma(m)} \to 0} \left (\frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m)}}^{h_{\sigma(m)}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m)}} \right ) - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$
and consequently $$\left \| \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \cdots\Delta_{j_{\sigma(m-1)}}^{h_{\sigma(m-1)}} \partial_{j_{m}} f (a)}{h_{\sigma(1)} \cdots h_{\sigma(m-1)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta \quad \text{by} \,\, \textbf{Lemma 1}$$
Iterating this process of taking limit, we get $$\left \| \lim_{h_{\sigma(1)} \to 0} \frac{\Delta_{j_{\sigma(1)}}^{h_{\sigma(1)}} \left ( \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right) (a)}{h_{\sigma(1)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$ or equivalently $$\left \| \lim_{h_{\sigma(1)} \to 0} \frac{ \left ( \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right) (a + h_{\sigma(1)} e_{j_{\sigma(1)}}) - \left (\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right)(a)}{h_{\sigma(1)}} - \partial_{j_{1}} \cdots \partial_{j_{m}} f (a) \right \| \le \delta$$
For all $\delta >0$, there is $\epsilon >0$ such that for all $|h_{\sigma(1)}| < \epsilon$, the last inequality holds. It follows that $$\partial_{j_{\sigma(1)}}\left (\partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f \right)(a) = \partial_{j_{1}} \cdots \partial_{j_{m}} f (a)$$ and consequently $$\partial_{j_{\sigma(1)}} \partial_{j_{\sigma(2)}} \cdots \partial_{j_{\sigma(m)}} f (a) = \partial_{j_{1}} \cdots \partial_{j_{m}} f (a)$$
This completes the proof.
Best Answer
I've just figured out a variant that utilized the MVT, so I posted it here. I would be great if someone helps me verify it. Thank you so much for your help!
My attempt:
For $a \in X$, we define $A \in \mathcal L(\mathbb R^n,F)$ by $ A h=\sum_{k=1}^{n} \partial_{k} f(a) h_{k}$ for all $h=\left(h_{1}, \ldots, h_{n}\right) \in X$. Our goal is to show that $\partial f(a) = A$ or equivalently $$\lim _{h \rightarrow 0} \frac{f(a+h)-f(a)-A h}{|h|_\infty}=0$$
Let $x_k = a+ (h_1,\ldots,h_k,0,\ldots,0)$ for all $k = \overline{1,n}$. It follows that $f(a+h)-f(a)=\sum_{k=1}^{n}\left(f\left(x_{k}\right)-f\left(x_{k-1}\right)\right)$. Let $\{e_1,\ldots,e_n\}$ be the standard basis of $\mathbb R^n$. By definition, we have $$\begin{aligned} h_k\partial_{k} f\left(x_{k-1}+t h_{k} e_{k}\right) &= h_k \lim_{z \to 0} \frac{f\left(x_{k-1}+t h_{k} e_{k} + ze_k\right) - f\left(x_{k-1}+t h_{k} e_{k}\right)}{z} \\ &= \lim_{z \to 0} \frac{f\left(x_{k-1}+ th_{k}e_k+(z/h_k)h_ke_{k}\right) - f\left(x_{k-1}+t h_{k} e_{k}\right)}{z/h_k}\\ &= \lim_{z' \to 0} \frac{f\left(x_{k-1}+ th_{k}e_k+z'h_ke_{k}\right) - f\left(x_{k-1}+t h_{k} e_{k}\right)}{z'}\\ &= \lim_{z \to 0} \frac{f\left(x_{k-1}+ (t+z')h_{k}e_k\right) - f\left(x_{k-1}+t h_{k} e_{k}\right)}{z'}\\ &= \frac{\partial f\left(x_{k-1}+t h_{k} e_{k}\right)}{\partial t} \end{aligned}$$
We apply Mean Value Theorem for the map $\mathbb R \to F, \quad t \mapsto f\left(x_{k-1}+t h_{k} e_{k}\right)$ and get $$\begin{aligned}f(x_k) - f(x_{k-1}) &= f(x_{k-1}+1\cdot h_ke_k)-f(x_{k-1}+0\cdot h_ke_k)\\ &= \frac{\partial f\left(x_{k-1}+t h_{k} e_{k}\right)}{\partial t} (t_k) \\ &= h_k \partial_{k} f\left(x_{k-1}+t_k h_{k} e_{k}\right)\end{aligned}$$
Consequently, $$\begin{aligned} \|f(a+h)-f(a) - Ah \| &=\left \|\sum_{k=1}^{n} h_{k} \left(\partial_{k} f\left(x_{k-1}+t_k h_{k} e_{k}\right) -\partial_{k} f(a)\right) \right \| \\ &\le \sum_{k=1}^{n} |h_{k}| \left \| \partial_{k} f\left(x_{k-1}+t_k h_{k} e_{k}\right) -\partial_{k} f(a)\right \| \\ &\le |h|_\infty \sum_{k=1}^{n} \left \| \partial_{k} f\left(x_{k-1}+t_k h_{k} e_{k}\right) -\partial_{k} f(a)\right \| \\ &\le |h|_\infty \sum_{k=1}^{n} \sup_{t \in [0,1]} \left \| \partial_{k} f\left(x_{k-1}+t h_{k} e_{k}\right) -\partial_{k} f(a) \right \| \\ &\le |h|_\infty \sum_{k=1}^{n} \sup_{|x-a|_\infty \le |h|_\infty} \left \| \partial_{k} f\left(x\right) -\partial_{k} f(a)\right \| \end{aligned}$$
We have $h \to 0$ implies $|h|_\infty \to 0$, which in turn implies $x \to a$. It follows from the continuity of $\partial_{k} f\left(\cdot\right)$ that $$\sup_{|x-a|_\infty \le |h|_\infty} \left \| \partial_{k} f\left(x\right) -\partial_{k} f(a)\right \| \to 0 \quad (x \to a)$$
Finally, $$\lim _{h \rightarrow 0} \frac{\| f(a+h)-f(a)-A h \|}{|h|_\infty} \le \lim _{h \rightarrow 0} \sum_{k=1}^{n} \sup_{|x-a|_\infty \le |h|_\infty} \left \| \partial_{k} f\left(x\right) -\partial_{k} f(a)\right \| = 0$$ Consequently, $$\lim _{h \rightarrow 0} \frac{f(a+h)-f(a)-A h}{|h|_\infty}=0$$
Hence $\partial f(a) = A$. Next we prove that $\partial f(\cdot): X \to \mathcal L(\mathbb R^n,F)$ is continuous. We have $$\begin{aligned}\|\partial f(x)h - \partial f(a)h\| &= \left\| \sum_{k=1}^{n} \partial_{k} f\left(x\right) h_{k} - \sum_{k=1}^{n} \partial_{k} f\left(a\right) h_{k} \right\| \\ &= \left\| h_k \sum_{k=1}^{n} ( \partial_{k} f\left(x\right) - \partial_{k} f\left(a\right)) \right\| \\&\le |h|_\infty \sum_{k=1}^{n} \left\|\partial_{k} f\left(x\right) - \partial_{k} f\left(a\right) \right\| \end{aligned}$$
Consequently, $$\|\partial f(x) - \partial f(y)\| = \sup_{h \in X} \frac{\|\partial f(x)h - \partial f(a)h\|}{|h|_\infty} \le \sum_{k=1}^{n} \left\|\partial_{k} f\left(x\right) - \partial_{k} f\left(a\right) \right\|$$
It follows from the continuity of $\partial_{k} f\left(\cdot\right)$ that $\sum_{k=1}^{n} \left\|\partial_{k} f\left(x\right) - \partial_{k} f\left(a\right) \right\| \to 0$ and thus $\|\partial f(x) - \partial f(y)\| \to 0$ as $x \to a$. Hence $\partial f(x) \to \partial f(y)$.