It turns out that twice-differentiability implies that the Hessian is symmetric even without convexity and with no reference to whether the second-order partial derivatives are continuous! The proof below is based on Theorem 8.12.2 in the book Foundations of Modern Analysis by Dieudonné (1969, p. 180).
Claim: Let $U\subseteq\mathbb R^n$ be an open set and $f:U\to\mathbb R$ a function. Suppose that $f$ is (Fréchet) differentiable on $U$ and that it is twice (Fréchet) differentiable at $\mathbf x_0\in U$. Then, the Hessian matrix $\mathbf H(\mathbf x_0)$ at $\mathbf x_0$ is symmetric.
Proof: Let $\mathbf D:U\to\mathbb R^n$ denote the gradient function of $f$. Fix $\varepsilon>0$. Since $\mathbf D$ is Fréchet differentiable at $\mathbf x_0$ by assumption, it follows that there exists some $\delta>0$ such that $\|\mathbf v\|<4\delta$ implies that
$$\left\|\mathbf D(\mathbf x_0+\mathbf v)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot\mathbf v\right\|\leq\varepsilon\|\mathbf v\|.$$
There is no loss of generality in taking $\delta$ to be so small that the open ball $B(4\delta,\mathbf x_0)$ is contained in the open set $U$.
For any $i,j\in\{1,\ldots,n\}$, let $\mathbf e_i$ and $\mathbf e_j$ be the corresponding standard basis vectors of unit length. Let $\mathbf s\equiv\delta\mathbf e_i$ and $\mathbf t\equiv\delta\mathbf e_j$. It is clear that $\mathbf x_0+\xi\mathbf s+\mathbf t$ and $\mathbf x_0+\xi\mathbf s$ are both in $U$ whenever $\xi\in[0,1]$; this is because $\|\xi\mathbf s+\mathbf t\|<4\delta$ and $\|\xi\mathbf s\|<4\delta$. Define the following function $g:[0,1]\to\mathbb R$:
$$g(\xi)\equiv f(\mathbf x_0+\xi\mathbf s+\mathbf t)-f(\mathbf x_0+\xi\mathbf s)\quad\forall\xi\in[0,1].$$
Clearly, $g$ is continuous on $[0,1]$ and differentiable on $(0,1)$. Lagrange's mean-value theorem, in turn, implies that there exists some $\xi\in(0,1)$ such that
$$g(1)-g(0)=g'(\xi)=\mathbf s\cdot\left[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0+\xi\mathbf s)\right],$$
using the chain rule.
Next, one can derive the following chain of inequalities (the first one uses the Cauchy–Schwarz inequality):
\begin{align*}
&\left|g(1)-g(0)-\mathbf s\cdot\mathbf H(\mathbf x_0)\cdot\mathbf t\right|\leq\underbrace{\|\mathbf s\|}_{=\delta}\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)]-\mathbf H(\mathbf x_0)\cdot\mathbf t\right\|\\
=&\,\delta\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s+\mathbf t)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s)]\right\|\\
\leq&\,\delta\varepsilon\left(\|\xi\mathbf s+\mathbf t\|+\|\xi\mathbf s\|\right)<8\delta^2\varepsilon.
\end{align*}
That is, one has that
$$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j|<8\delta^2\varepsilon,$$
and, by a completely analogous and symmetric reasoning in which $\mathbf s$ and $\mathbf t$ are interchanged,
$$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_j\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_i|<8\delta^2\varepsilon.$$
Given that $\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j=h_{ij}(\mathbf x_0)\equiv\partial^2 f/(\partial x_i\partial x_j)(\mathbf x_0)$, the preceding two inequalities imply that
$$\left|h_{ij}(\mathbf x_0)-h_{ji}(\mathbf x_0)\right|<16\varepsilon.$$
Taking $\varepsilon$ to be arbitrarily small, one sees that $h_{ij}(\mathbf x_0)=h_{ji}(\mathbf x_0)$. $\blacksquare$
In general, $g_i(x)$ is not convex. We can construct a counterexample as follows.
Consider the case of three categories, where $\boldsymbol{x} = (x_1, x_2, x_3)$. If $g_1(\boldsymbol{x})$ is convex, then $h(t) \triangleq g_1((0, t, -t))$ should also be convex.
Since
$$
\begin{aligned}
g_1(\boldsymbol{x}) &= \log \frac{\exp(x_1) + \exp(x_2) + \exp(x_3)}{\exp(x_2) + \exp(x_3)}\\
&= \log \left(1 + \frac{1}{\exp(x_2 - x_1) + \exp(x_3 - x_1)}\right)
\end{aligned}
$$
we know that
$$
h(t) = \log\left(1 + \frac{1}{e^{t} + e^{-t}}\right),
$$
and it is not convex.
Indeed, if $h(t)$ is convex, note that
$$
\lim_{t \to \infty} h(t) = 0,
$$
then the convexity will imply $h(t) \equiv 0$, which is impossible.
Best Answer
Using the short-hands $$t_i=s\cdot e^{x_i}\qquad s=\left(\sum_je^{x_j}\right)^{-1}$$ the diagonal and off-diagonal entries of the Hessian are $$t_i(1-t_i)\qquad\text{and}\qquad -t_it_j$$ respectively, hence it suffices to show that $Q(u)\geqslant0$ for every $u=(u_i)$, where $$Q(u)=\sum_i t_i(1-t_i)u_i^2- \sum_{i\ne j}t_it_ju_iu_j$$ But, by construction, $$\sum_it_i=1$$ hence $$Q(u)=\sum_it_iu_i^2\cdot\sum_it_i-\left(\sum_it_iu_i\right)^2$$ hence Cauchy-Schwarz inequality shows that indeed $Q(u)\geqslant0$.