You don't need to worry about expectations and covariances, since the difference operator kills the polynomial term. In other words, the result turns on what the difference operator does to polynomials. Once you have taken $d$ differences, you are only left with the stochastic part of your model.
So let's prove this. Since the difference operator is linear, you only have to prove the result for $t^d$.
I would go for a proof by induction. The first difference kills the linear term, since you get $at-b - a(t-1)-b=2b$ A first difference gives you a stationary time series with non-zero mean.
For higher polynomials, say $d$, the first difference gives $t^d-(t-1)^d=t^d-t^d -nt^{n-1}+\text{a polynomial of degree}\ d-1$.
See what happens when we take the second difference. We have $nt^{n-1}$ (plus lower order junk) against $n(t-1)^{n-1}$ (plus lower order junk) from the difference of the next two terms. The induction assumption works here, because we have the same coefficient for the leading term. Note that we are making use of the assumption that the time series is sampled at equally spaced moments, or we would have got a different leading term in the second "first difference". We usually make this assumption for time series, and you can see that it matters in this situation.
You can argue inductively for both the necessary and the sufficient condition.
You still have to deal with $X_t$, the stationary part of the expression. You need to show that the first difference of a stationary time series is also stationary - but that's obvious.You only need to show this for the first difference.
Best Answer
This question needs some ideas from random processes and some from Fourier theory.
The autocorrelation function of a (continuous-time finite-variance) stationary random process $\{X_t\colon -\infty < t < \infty\}$ is $R_X(t) = E[X_{\tau}X_{\tau+t}]$ and the spectral density $S_X(\omega)$ is the Fourier transform of $R_x(t)$. For your problem, the independence of the $\{X_t\}$ and $\{Y_t\}$ processes gives that
$$R_Z(t) = E[Z_{\tau}Z_{\tau+t}] = E[X_{\tau}Y_{\tau}X_{\tau+t}Y_{\tau+t}] = E[X_{\tau}X_{\tau+t}]E[Y_{\tau}Y_{\tau+t}]= R_X(t)R_Y(t).$$
So much so for random processes. From Fourier transform theory, we have that the transform of a product of two functions is the convolution of their Fourier transforms. If you are not familiar with this, see, for example, the last paragraph of Section 5.8 of the Wikipedia article on the Fourier transform. Thus, $$S_Z(\omega) = \int_{-\infty}^\infty S_X(\lambda)S_Y(\omega-\lambda)\, \mathrm d\lambda.\tag{1}$$
For your particular application with a discrete-time random process (a.k.a. time series), similar results apply but in $(1)$ the limits work out to be $-\pi$ and $\pi$. (You will need to know about the discrete-time Fourier transform to get to this). Note also that you might be missing a $\frac{1}{2\pi}$ factor in the result you state as wanting to prove in your question.