For prime $q \geq 5$ write the count as
$$
\frac1{1152} q (q-1) (q^3 - 21q^2 + 171 q - c_q)
$$
where
$$
c_q = 483 + 36 \left(\frac{-1}{q}\right) + 64 \left(\frac{-3}{q}\right)
+ \delta_q.
$$
Then for $(\frac{-2}{q}) = -1$ Ronald Bacher's calculations
indicate $\delta_q=0$. If $(\frac{-2}{q}) = +1$ then
$q$ can be written as $m^2 + 2n^2$, uniquely up to changing
$(m,n)$ to $(\pm m, \pm n)$, and we have
$$
\delta_q = 24(m^2 - 2n^2) + 192 + 72 \left(\frac{-1}{q}\right).
$$
The explanation is as follows. Start as did Will Sawin
by considering the variety of $(s_1,s_2,s_3,s_4,t_1,t_2,t_3,t_4)$
such that for $i=1,2,3$ the $i$-th elementary symmetric function
of the $s$'s equals the $i$-th elem.sym.fn. of the $t$'s.
We may apply any $aX+b$ transformation to all $8$ variables,
which explains the $q(q-1)$ factor. (The factor $1152 = 2 \cdot 4!^2$
is from coordinate permutations that respect the partition of the
$8$ variables into two sets of $4$.) In odd characteristic, there's
a unique representative with $\sum_{i=1}^4 s_i = \sum_{i=1}^4 t_i = 0$;
this takes care of the translations, and then we mod out by scalars
by going to projective space. We end up with the complete intersection
of a quadric and a sextic in ${\bf P}^5$. This threefold, call it
${\cal M}$, turns out to be rational. (This has probably been known
for some time, because ${\cal M}$ classifies perfect multigrades of order $4$,
and such things have been studied since the mid-19th century, see the
Prouhet-Tarry-Escott problem; I outline a proof below.)
However, by requiring that all coordinates be distinct
we're removing some divisor ${\cal D}$ on this threefold,
so the final count decreases by the outcome of an inclusion-exclusion formula
whose terms are point counts over some subvarieties of ${\cal M}$.
Most of these sub varieties are rational curves, or points that may be defined
over ${\bf Q}(i)$ or ${\bf Q}(\sqrt{-3})$, the latter explaining
the appearance of Legendre symbols $(\frac{-1}{p})$, $(\frac{-3}{p})$
in the counting formula. But the two-dimensional components of ${\cal D}$
are isomorphic K3 surfaces, arising as a complete intersection of
a quadric and a cubic in ${\bf P}^4$; and those components make a more complicated
contribution. Fortunately these K3 surfaces are "singular" (i.e. their
Picard number attains the maximum of $20$ for a K3 surface in
characteristic zero) $-$ I computed that they're birational with
the universal elliptic curve over $X_1(8)$ $-$ and it is known that
the point-count of this singular K3 surface can be given by a formula
that involves $m^2-2n^2$ when $(\frac{-2}{q}) = +1$.
To show that $\cal M$ is rational, it is convenient to apply a linear
change of variables from the "$A_3$" coordinates $s_i,t_i$ to
"$D_3$" coordinates, say $a,b,c$ and $d,e,f$, with
$$
s_i = a+b+c, \phantom+ a-b-c, \phantom+ -a+b-c, \phantom+ -a-b+c
$$
and likewise $t_i = d+e+f, \phantom. d-e-f, \phantom. -d+e-f, \phantom. -d-e+f$.
Then $\sum_{i=1}^4 s_i = \sum_{i=1}^4 t_i = 0$ holds automatically,
and the quadric and cubic become simply
$$
a^2+b^2+c^2 = d^2+e^2+f^2,
\phantom\infty
abc = def.
$$
Let $d=pa$ and $e=qb$. Then $f=(pq)^{-1}c$, and the quadric becomes
a conic in the $(a:b:c)$ plane with coefficients depending on $p,q$:
$$
(p^2-1)a^2 + (q^2-1)b^2 + ((pq)^{-2}-1) c^2 = 0.
$$
So $\cal M$ is birational to a conic bundle over the $(p,q)$ plane,
and this conic bundle has a section $(a:b:c:d:e:f) = (1:p:pq:p:pq:1)$
which lets us birationally identify $\cal M$ with the product of the
$(p,q)$ plane with ${\bf P}^1$. This is a rational threefold, QED.
We have
$$ \# \{x\in \mathbb{F}_{q^{2n}}|\mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}(x)=0 \hbox{ and } x=y^n \textrm{ for some }y\in \mathbb{F}_{q^{2n}}\} = \frac{\#\{y\in \mathbb{F}_{q^{2n}}|\mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}(y^n)=0 \} +n-1}{ n}$$
since $\mathbb F_{q^n}$ contains all the $n$th roots of unity, so it suffices to calculate.
\begin{align*}
\# \{y\in \mathbb{F}_{q^{2n}}|\mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}(y^n)=0 \} &= \frac{1}{q^2} \sum_{\lambda \in \mathbb F_{q^2} } \sum_{ y \in \mathbb F_{q^{2n}} }\psi_2 (\lambda \mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}(y^n) ) \\ &= \frac{1}{q^2} \sum_{\lambda \in \mathbb F_{q^2} } \sum_{ y \in \mathbb F_{q^{2n}} }\psi_{2n} (\lambda y^n) ) \\& = q^{2n-2} + \frac{1}{q^2} \sum_{\lambda \in \mathbb F_{q^2}^\times } \sum_{ y \in \mathbb F_{q^{2n}} }\psi_{2n} (\lambda y^n) ) \\ &=q^{2n-2} + \frac{1}{q^2} \sum_{\lambda \in \mathbb F_{q^2}^\times } \sum_{ x \in \mathbb F_{q^{2n}} } \sum_{\substack{ \chi \colon \mathbb F_{q^{2n}}^\times \to \mathbb C^\times \\ \chi^n=1 } } \psi_{2n} (\lambda x) \chi(x) \\ &=q^{2n-2} + \frac{1}{q^2} \sum_{\lambda \in \mathbb F_{q^2}^\times } \sum_{ x \in \mathbb F_{q^{2n}} } \sum_{\substack{ \chi \colon \mathbb F_{q^{2n}}^\times \to \mathbb C^\times \\ \chi^n=1 } } \psi_{2n} (x) \chi(\lambda^{-1} x) \end{align*}
where $\psi_2 \colon \mathbb F_{q^2} \to \mathbb C^\times$ is a nondegenerate additive character and $\psi_{2n} = \psi_2 \circ \mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}$
Now since $\chi$ has order $n$ and the quotient group $\mathbb F_{q^{2n}}^\times / \mathbb F_{q^2}^\times $ has order $q^{2n-2} + q^{2n-4} + \dots + 1$ which is divisible by $n$, $\chi$ must factor through this quotient group so $\chi(\lambda^{-1})=1$, giving
$$\#\{y\in \mathbb{F}_{q^{2n}}|\mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}(y^n)=0 \} = q^{2n-2} + \frac{q^2-1}{q^2} \sum_{\substack{ \chi \colon \mathbb F_{q^{2n}}^\times \to \mathbb C^\times \\ \chi^n=1 } } \sum_{ x \in \mathbb F_{q^{2n}} } \psi_{2n} (x) \chi( x) .$$
Here the inner sum $\sum_{ x \in \mathbb F_{q^{2n}} } \psi_{2n} (x) \chi( x)$ vanishes for $\chi$ trivial and is a Gauss sum of absolute value exactly $q^{n}$ for $\chi$ trivial. This gives an estimate with main term and error term
$$ \left| \#\{y\in \mathbb{F}_{q^{2n}}|\mathrm{Tr}_{\mathbb{F}_{q^{2n}}/\mathbb{F}_{q^2}}(y^n)=0 \} -q^{2n-2} \right| \leq (n-1) (q^2-1) q^{n-2} $$
Best Answer
I can show that exceptions occur at most for $n=3$. (Primality of $n$ is never used.)
Since $n$ is odd, $\mathbb F_{q^{2n}} = \mathbb F_{q^2} \otimes_{\mathbb F_q} \mathbb F_{q^n}$. The trace map $\operatorname{Tr}:\mathbb F_{q^{2n}} \to \mathbb F_{q^2}$ is obtained by tensoring the identity map $\mathbb F_{q^2} \to \mathbb F_{q^2}$ with the trace map $\operatorname{tr} : \mathbb F_{q^n} \to \mathbb F_q$.
Thus, choosing an arbitrary basis of $\mathbb F_{q^2}$, we can write any $a$ as a pair of elements $a_1,a_2 \in \mathbb F_{q^n}$, and your condition that $y \in \mathbb F_{q^n}$ satisfies $\operatorname{Tr}(y)\neq 0$ but $\operatorname{Tr}(ay)=0$ is equivalent to the condition that $\operatorname{tr} (y) \neq 0$ but $\operatorname{tr} (a_1 y ) =\operatorname{tr}(a_2 y)=0$. (We can ignore the condition that $y\neq 0$ as it is implied by the condition that $y$ has trace zero.)
Since the trace map of a product is a perfect $\mathbb F_q$-linear pairing on $\mathbb F_q^n$, such a $y$ exists unless $1$ is an $\mathbb F_q$-linear combination of $a_1$ and $a_2$.
I will show there must exist a member of $\mathcal A$ that has this unusual property by bounding the number of members of $\mathcal A$ that do have this unusual property.
Note that every member of $\mathcal A$ is in the subgroup of order $q^n+1$, thus has norm to $\mathbb F_{q^n}$ equal to $1$. This is a nonsingular quadratic equation in $a_1,a_2$. For each $\lambda_1,\lambda_2$ in $\mathbb F_q$, not both zero, $\lambda_1 a_1 + \lambda_2 a_2 =1$ is a linear equation. There can be at most two solutions to a linear equation together with an nonsingular quadratic equation in two variables, since it gives a nontrivial quadratic equation in one variable.
Summing over possible choices of $\lambda_1,\lambda_2$, the number of members of $\mathcal A $ with this unusual property is at most $2 (q^2-1)$. So we can only have all members of $\mathcal A$ with this property if
$$ \frac{q^n+1}{q+1} \geq 2 (q^2-1)$$ i.e.
$$q^n+1 \geq 2 (q^2-1) (q+1).$$
For $n\geq 5$, the left side dominates the right side for any $q$.