It turns out that twice-differentiability implies that the Hessian is symmetric even without convexity and with no reference to whether the second-order partial derivatives are continuous! The proof below is based on Theorem 8.12.2 in the book Foundations of Modern Analysis by Dieudonné (1969, p. 180).
Claim: Let $U\subseteq\mathbb R^n$ be an open set and $f:U\to\mathbb R$ a function. Suppose that $f$ is (Fréchet) differentiable on $U$ and that it is twice (Fréchet) differentiable at $\mathbf x_0\in U$. Then, the Hessian matrix $\mathbf H(\mathbf x_0)$ at $\mathbf x_0$ is symmetric.
Proof: Let $\mathbf D:U\to\mathbb R^n$ denote the gradient function of $f$. Fix $\varepsilon>0$. Since $\mathbf D$ is Fréchet differentiable at $\mathbf x_0$ by assumption, it follows that there exists some $\delta>0$ such that $\|\mathbf v\|<4\delta$ implies that
$$\left\|\mathbf D(\mathbf x_0+\mathbf v)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot\mathbf v\right\|\leq\varepsilon\|\mathbf v\|.$$
There is no loss of generality in taking $\delta$ to be so small that the open ball $B(4\delta,\mathbf x_0)$ is contained in the open set $U$.
For any $i,j\in\{1,\ldots,n\}$, let $\mathbf e_i$ and $\mathbf e_j$ be the corresponding standard basis vectors of unit length. Let $\mathbf s\equiv\delta\mathbf e_i$ and $\mathbf t\equiv\delta\mathbf e_j$. It is clear that $\mathbf x_0+\xi\mathbf s+\mathbf t$ and $\mathbf x_0+\xi\mathbf s$ are both in $U$ whenever $\xi\in[0,1]$; this is because $\|\xi\mathbf s+\mathbf t\|<4\delta$ and $\|\xi\mathbf s\|<4\delta$. Define the following function $g:[0,1]\to\mathbb R$:
$$g(\xi)\equiv f(\mathbf x_0+\xi\mathbf s+\mathbf t)-f(\mathbf x_0+\xi\mathbf s)\quad\forall\xi\in[0,1].$$
Clearly, $g$ is continuous on $[0,1]$ and differentiable on $(0,1)$. Lagrange's mean-value theorem, in turn, implies that there exists some $\xi\in(0,1)$ such that
$$g(1)-g(0)=g'(\xi)=\mathbf s\cdot\left[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0+\xi\mathbf s)\right],$$
using the chain rule.
Next, one can derive the following chain of inequalities (the first one uses the Cauchy–Schwarz inequality):
\begin{align*}
&\left|g(1)-g(0)-\mathbf s\cdot\mathbf H(\mathbf x_0)\cdot\mathbf t\right|\leq\underbrace{\|\mathbf s\|}_{=\delta}\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)]-\mathbf H(\mathbf x_0)\cdot\mathbf t\right\|\\
=&\,\delta\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s+\mathbf t)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s)]\right\|\\
\leq&\,\delta\varepsilon\left(\|\xi\mathbf s+\mathbf t\|+\|\xi\mathbf s\|\right)<8\delta^2\varepsilon.
\end{align*}
That is, one has that
$$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j|<8\delta^2\varepsilon,$$
and, by a completely analogous and symmetric reasoning in which $\mathbf s$ and $\mathbf t$ are interchanged,
$$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_j\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_i|<8\delta^2\varepsilon.$$
Given that $\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j=h_{ij}(\mathbf x_0)\equiv\partial^2 f/(\partial x_i\partial x_j)(\mathbf x_0)$, the preceding two inequalities imply that
$$\left|h_{ij}(\mathbf x_0)-h_{ji}(\mathbf x_0)\right|<16\varepsilon.$$
Taking $\varepsilon$ to be arbitrarily small, one sees that $h_{ij}(\mathbf x_0)=h_{ji}(\mathbf x_0)$. $\blacksquare$
There is no trick -- you've hit upon the three ways to deal with the situation: either work in coordinates, or write down the derivative of $\mathcal{E}$ as a rank three tensor that takes in matrices and outputs vectors, or "vectorize"/flatten your matrix before taking the derivative.
Sometimes $L$ has a special form that makes it easier to use one of these approaches; for example if $L$ is the squared Frobenius norm
$$L(M,N) = \operatorname{tr}[ (M-N)^T(M-N)]$$
then it is easy to "vectorize" $L$; or alternatively
$$dL = \operatorname tr[(dM-dN)^T(M-N) + (M-N)^T(dM-dN)] = 2(M-N) : (dM-dN)$$
where $:$ is the Frobenius product. It's not a Jacobian in matrix form (which as you note doesn't exist) but it's the best you can get.
Best Answer
This is true, as a consequence of the Poincaré lemma for differential forms.
In $\mathbb{R}^3$, you may be familiar with the fact that a vector field $\mathbf{f}$ is a gradient $\mathbf{f}=\operatorname{grad}y$ if and only if $\operatorname{curl}\mathbf{f}=0$. The Poincaré lemma generalizes this, and the curl is replaced by the antisymmetrized derivative, which vanishes in your case.
For a more elementary argument, note that if such a $y$ exists, then we must have $y(p)-y(q)=\int_\gamma\mathbf{f}\cdot dl$, where $\gamma$ is any smooth path from $p$ to $q$. Setting $y(0)=0$, we must therefore have $$ y(\mathbf{x})=\int_0^1\mathbf{f}(t\mathbf{x})\cdot\mathbf{x}dt $$ To check that this $y$ satisfies $\mathbf{f}=\operatorname{grad} y$, we can compute the gradient using index notation. Let $F_i(\mathbf{x},t)=f_i(t\mathbf{x})$, and note that $\frac{\partial F_i}{\partial t}(\mathbf{x},t)=x_j\frac{\partial f_i}{\partial x_j}(t\mathbf{x})$ $$ \frac{\partial y}{\partial x_i}(\mathbf{x})=\frac{\partial}{\partial x_i}\left(x_j\int_0^1 f_j(t\mathbf{x})dt\right) \\ =\int_0^1 f_i(t\mathbf{x})dt+x_j\int_0^1t\frac{\partial f_j}{\partial x_i}(t\mathbf{x})dt \\ =\int_0^1 f_i(t\mathbf{x})dt+x_j\int_0^1t\frac{\partial f_i}{\partial x_j}(t\mathbf{x})dt \\ =\int_0^1 f_i(t\mathbf{x})dt+\int_0^1t\frac{\partial F_i}{\partial t}(\mathbf{x},t)dt \\ =\int_0^1 f_i(t\mathbf{x})dt+tF_i(\mathbf{x},t)|_{t=0}^{t=1}-\int_0^1F_i(\mathbf{x},t)dt \\ =f_i(\mathbf{x}) $$