I found that a good way to handle this sum is using Bernstein's inequality:
Let $X_1, \ldots, X_n$ be independent zero-mean random variables. Suppose that $|X_i|\leq M$ almost surely, for all $i$. Then, for all positive
$t$,
$$\Pr \left (\sum_{i=1}^n X_i \ge t \right ) \leq \exp \left(
-\frac{t^2/2}{\sum \mathbb{E} X^2+ Mt/3} \right).$$
We have
$$
\Pr[X^2Y^2\ge t]
\le E[(XY)^{2p}]t^{-p}
\le 2(2p/e)^{2p}t^{-p}
\le 2 e^{-\sqrt t},
$$
taking $p=\sqrt{t}/2$. Hence we can take a union bound over all $XY$ to get:
$$
\begin{align}
\Pr\left(
\sum_k X_k^2Y_k^2\ge (1+\epsilon)k
\right)
&\le
\exp \left(
-\frac{k\epsilon^2/2}{9+ M\epsilon/3} \right)
+2k \exp(-\sqrt M)\le2\delta
\end{align}
$$
When taking $M=\log^2((2 k)/\delta)$ and
$$\begin{align}t
&=
\frac{18 \log \left(\frac{1}{\delta}\right)}{\epsilon^2}+\frac{2 \log \left(\frac{1}{\delta}\right) \log^2 \left(\frac{2 k}{d}\right)}{3 \epsilon}
=O(\epsilon^{-2}\log1/\delta+\epsilon^{-1}\log^32k/\delta).
\end{align}$$
This matches exactly what we would expect from the central limit theory in the first term, and nearly bound from a "single large term" in the second term.
The only remaining question is whether we can get rid of the third power, to just have $\epsilon^{-1}\log^2(k/\delta)$ in the second term. I don't know how to do that though.
Update
The previous approach lost a factor $\log1/\delta$ in $t$.
This can be avoided by the following neat trick: Instead of using Bernstein's inequality, just use $(XY)^2\le M^{1/2}|XY|\le \sqrt M (X^2+Y^2)/2$.
That gives us
$$\begin{align}
\Pr[\sum_iX_i^2Y_i^2 \ge (1+\epsilon)n \mid X^2Y^2\le M]
&\le \exp\left(\lambda n X^2Y^2\right)/\exp\left(\lambda(1+\epsilon)\right)
\\&\le \exp\left(\lambda n \sqrt M (X^2+Y^2)/2\right)/\exp\left(\lambda(1+\epsilon)n\right)
\\&= \frac{1}{1-\lambda n\sqrt M}/\exp\left(\lambda(1+\epsilon)n\right)
\\&=\frac{(1+\epsilon)n}{\sqrt M}\exp\left(1 - \frac{(1+\epsilon)n}{\sqrt M}\right)
\\&=\frac{(1+\epsilon)n}{\log(2n/\delta)}
\exp\left(1 -
\frac{(1+\epsilon)n}{\log(2n/\delta)}
\right)
%\\&=\sqrt{(1+\epsilon)n}\exp\left(1 - \sqrt{(1+\epsilon)n}\right).
\end{align}$$
Taking $\lambda=((1+\epsilon)n-M)/(M (1+\epsilon)n)$ and $M=(1+\epsilon)n$.
The union bound over from the single elements is then $+2n\exp(-\sqrt{(1+\epsilon)n})$.
We see that taking $n\approx\epsilon^{-2}\log1/\delta + \epsilon^{-1}(\log1/\delta)^2$ now suffices.
The method is nearly as versatile as the Bernstein approach, and it gives the optimal values.
Update 2
Perhaps a more general approach is to use moments instead of exponential generating functions.
Define $\|X\|_p = E[|X|^p]^{1/p}$ and note Khintchine's inequality $\|X\|_p \le C \sqrt{p}$ for normal distribution random variables, where $C$ is a universal constant, like 3.
Define $x\lesssim y$ to mean $\le$ up to a universal constant.
We now have $\|XY\|_p \le \|X\|_{2p}\|Y\|_{2p}$ by Cauchy Schwarz,
and $\|X^2\|_p = E[X^{2p}]^{1/p}=\|X\|_{2p}^2$.
From the triangle inequality we get $\|X-E[X]\|_p\le \|X\|_p + E[X]$.
Hence $\|X^2Y^2-1\|_p\lesssim p^2+1\lesssim p^2$ (assuming $p\ge 1$.)
The final step is to use Latała's inequality for sums of random variables. In particular the version in Corollary 38 of our eventual paper:
Let $X_1,\dots$ be mean 0 and iid. and assume further $\|X_i\|_p \lesssim p^\alpha$, where $p\ge 2, \alpha\ge 1$.
Then $\|\sum_{i=1}^n X_i\|_p \lesssim \max\{\sqrt{pn}, (n/p)^{1/p}p^\alpha\}$.
In this simplified version the constant hidden in $\sim$ may depend on $\alpha$, but that doesn't matter for our purposes.
The final piece is Markov's inequality with powers, and we get
$$\begin{align}
\Pr[\sum_{i=1}^n X_i^2Y_i^2 \ge (1+\epsilon)n]
&\le E[(\sum_{i=1}^n (X_i^2Y_i^2-1))^p]/(\epsilon n)^p
\\&\lesssim \left(\frac{\max\{\sqrt{np} , (n/p)^{1/p} p^2\}}{\epsilon n}\right)^p
\\&\le
\exp\left(-\frac{\epsilon^2n}{2e}\right)
+\sqrt{n/\epsilon}
\exp\left(1-\frac{2\sqrt{\epsilon n}}{e}\right),
\end{align}$$
where the last inequality comes from taking either $p=\epsilon^2 n/e$ or $p=\sqrt{\epsilon n}/e$.
We see that this bound actually improves on the previous method by a factor $\sqrt{n/\epsilon}$, and we can again take $n\approx\epsilon^{-2}\log1/\delta + \epsilon^{-1}(\log1/\delta)^2$.
A more general result is in the paper we eventually wrote.
The method using moments is very general, and the availability of Latała's inequality makes it perhaps more useful than the exponential method.
Best Answer
Let's imitate the proof of the Chernoff bound more closely. We have $$ \mathbb E[e^{tY}] = \prod_{i=1}^n \mathbb E[e^{t Y_i}] = \left(\frac{e^{-t} + 1 + e^t}{3}\right)^n. $$ Further, we have $\Pr[Y \ge \delta n] = \Pr[e^{tY} \ge e^{t \delta n}] \le \left(\frac{e^{-t} + 1 + e^t}{3}\right)^n e^{-t \delta n}.$
The Taylor series expansion of $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right)$ begins $\frac13 t^2 - \frac1{36}t^4 + \frac{13}{2640} t^6 \pm \cdots$. This leads us to hope that we have $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) \le \frac{t^2}{3}$ for all $t$, and we do.
To prove this: write $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right)$ as $f(t^2)$ where $f(u) = \ln\left(\frac{1 + 2\cosh \sqrt u}{3}\right)$. Then $f'(0) = \frac13$ (as we expected from the earlier Taylor series) and \begin{align} f''(u) &= \frac{\left(\cosh \sqrt{u}+2\right)-\frac{\sinh \sqrt{u}}{\sqrt u} \left(2 \cosh \sqrt{u}+1\right)}{2 u \left(2 \cosh \sqrt{u}+1\right)^2} \\ &\le \frac{\left(\cosh \sqrt{u}+2\right)-1 \cdot \left(2 \cosh \sqrt{u}+1\right)}{2 u \left(2 \cosh \sqrt{u}+1\right)^2}\\ &= \frac{1 - \cosh \sqrt u}{2 u \left(2 \cosh \sqrt{u}+1\right)^2} \le 0 \end{align} for all $u \ge 0$, since $\cosh$ is always at least $1$ and the denominator is nonnegative. Therefore $f(u)$ is concave for nonnegative $u$ and the tangent line $f(0) + f'(0) u$ is above $f(u)$ when $u \ge 0$. Substituting $u = t^2$, we get: $$ f(u) \le f(0) + f'(0) u \implies f(u) \le \frac13 u \implies \ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) \le \frac13 t^2. $$ Due to the Taylor series expansion, this is the best bound of its type: as $t \to 0$, we have $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) = \frac13 t^2 + O(t^4)$, so for all $\epsilon>0$, given sufficiently small $t$ we get $\ln\left(\frac{e^{-t} + 1 + e^t}{3}\right) > (\frac13 - \epsilon)t^2$. So we cannot improve on the inequality used above - not if we want to get a bound of the form $e^{-C\delta^2n}$ at the end.
Replacing $\frac{e^{-t} + 1 + e^t}{3}$ by its upper bound $e^{t^2/3}$, we get $$ \Pr[Y \ge \delta n] \le e^{\frac13t^2n - t\delta n} $$ and now the best choice of $t$ is $t = \frac32 \delta$, which gives us $$ \Pr[Y \ge \delta n] \le e^{-\frac34 \delta^2 n}. $$ This is the best bound of this type we could come up with with this method: by the Taylor series argument, we have captured the behavior of the Markov-type upper bound as $\delta \to 0$. (We can't rule out, though, that there's a better bound we can prove that doesn't go through the same application of Markov's inequality.) For $\delta$ further from $0$, we might get also better behavior by not ignoring the higher-order terms in $t$; this happens for the ordinary Chernoff bound as well.