This is a bit late, but I see that the main points in this question have not been completely addressed. I'll set
\begin{equation}
\sigma = 1
\end{equation}
for this answer.
The definition of white noise may be context-dependent: How you define it depends on what you want to do with it. There's nothing inherently wrong with saying that white noise (indexed by a set $T$) is just the process of iid standard normal random variables indexed by $T$, i.e. $E[X(t)X(s)] = \begin{cases} 1 & t = s \\ 0 & t \neq s \end{cases}.$ However, as cardinal noted here, Example 1.2.5 of Kallianpur's text shows that this process is not measurable (as a function of $(t, \omega)$). This is why, as Did commented above, $Y$ is undefined (with this definition of $X$). Thus, this definition of white noise is not appropriate for defining objects like $Y$.
Rather, you want $X$ to have covariance given by the Dirac delta. But the $\delta$ function is not a function but rather a measure and the best context for understanding it is the theory of distributions (or generalized functions---these are not to be confused with "probability distributions"). Likewise, the appropriate context for white noise is the theory of random distributions.
Let's warm up with a heuristic explanation: We'll think of white noise as the "derivative" of Brownian motion: "$dB_t/dt = X_t$". So ignoring rigor for a moment, we could write
\begin{equation}
\int_0^T h(t) X(t) dt = \int_0^T h(t) \frac{dB_t}{dt} dt = \int_0^T h(t) dB_t.
\end{equation}
The reason this isn't rigorous is that Brownian motion is nowhere differentiable. However, the theory of distributions allows us to "differentiate" non-differentiable functions. First of all, a distribution is a linear functional (linear map taking values in the real numbers) on a space of "test functions" (usually smooth functions of compact support). A continuous function $F$ can be viewed as a distribution via the pairing
\begin{equation}
(F, f) = \int_0^\infty F(t) f(t) dt.
\end{equation}
The distributional derivative of $F$ is the distribution $F'$ whose pairing with a test function $f$ is defined by
\begin{equation}
(F', f) = -(F, f').
\end{equation}
Thinking of Brownian motion as a random function, we can define white noise $X$ as its distributional derivative. Thus, $X$ is a random distribution whose pairing with a test function $f$ is the random variable
\begin{equation}
(X, f) = -(B, f') = -\int_0^\infty B(t) f'(t) dt.
\end{equation}
By stochastic integration by parts,
\begin{equation}
(X, f) = \int_0^\infty f(t) dB_t;
\end{equation}
this is the Itô integral of $f$ with respect to $B$.
Now a well-known fact in stochastic calculus is that $M_T = \int_0^T f(t) dB_t$ is a martingale starting at $M_0 = 0$, so $E (X, f) = 0$. Moreover, by the Itô isometry,
\begin{equation}
\mathrm{Var}((X, f)) = E (X, f)^2 = \int_0^\infty f(t)^2 dt.
\end{equation}
It can also be verified that $(X, f)$ is Gaussian.
My main point is that a more appropriate definition of $Y$ might be
\begin{equation}
Y = \int_0^T h(t) dB_t.
\end{equation}
As a last note, because of the way $X$ was defined above, $X_t$ is not defined but $(X, f)$ is. That is, $X$ is a stochastic process but whose index set is given by $T = \{ \text{test functions} \}$ rather than $T = [0, \infty)$. Moreover, again by the Itô isometry,
\begin{equation}
E (X, f) (X, g) = \int_0^\infty f(t) g(t) dt.
\end{equation}
Abandoning rigor again, this becomes
\begin{equation}
E (X, f) (X, g) = \int_0^\infty \int_0^\infty f(s) \delta(s - t) g(t) ds dt
\end{equation}
and it is in this sense that the covariance of $X$ is the Dirac delta.
Edit: Note that we could leave the definition of $(X, f)$ in terms of the ordinary integral and do all the above calculations using Fubini's theorem and (ordinary) integration by parts (it's just a bit messier).
The Wikipedia article is still under construction, and still contains errors.
I am one of the editors.
The formula you cite is from the section of the article about the power spectral density of a stochastic process, and is rather sloppy, it still needs to be corrected.
But the wordy definition you cite is from a different paragraph of the article, and applies first of all to an individual signal, i.e., a deterministic signal, i.e., a sample function of the process, ignoring the existence of all other sample functions, thus, ignoring the structure of the process. Secondly, it applies to a process too but only to the spectral decomposition of the process, and not to the formula you mention.
Now, the truth is this: given any function of time (a deterministic function of time) x(t),
such that $$\lim_{T\rightarrow\infty} {1\over 2T} \int_{-T}^T x(t+\tau)x(t) dt \,\,\,\,\,\ \ \ \ \ \ \ (*)$$ exists for all $\tau$, then one can find a statistical distribution function $S$, called the power spectral distibution function of $x$, such that for almost all frequencies $f_1,f_2$,
$S(f_2)-S(f_1)$ is the amount of power contributed to $x$ by frequencies in the band $[f_1,f_2]$ in the sense of the sum of the squares of the jumps at frequencies in that band of $s$, the generalised Fourier transform of $x$, defined by the limit in mean (i.e, the limit in an $L^2$ space, not a pointwise limit) of
$$s(\omega) = \int_{-A}^{-1} x(t) {e^{-i\omega t}\over it} dt + \int_{-1}^{1} x(t) {e^{-i\omega t}-1\over it} dt + \int^{A}_{1} x(t) {e^{-i\omega t}\over it} dt \,\,\,\,\,\ \ \ (**)$$ as $A$ goes to infinity, with $\omega = 2\pi f$.
The first tricky bit is that $x$ will not usually have a Fourier transform, which is why we have to put a factor of $t$ in the denominator here, for convergence. If only $x$ had a Fourier transform $X$, this generalisation, $s$, would be the integral of $X$.
The second tricky bit is that even if $s$ is continuous, it might be so far away from being differentiable that its "infinitesimal" jumps contribute something to the power. For this reason, the intuitive notion of "sum of squares of the Fourier coefficients of $x$ " has to be interpreted as the sum of the squares of the jumps of $s$" which, in turn, has to be interpreted as $$\lim_{\epsilon\rightarrow0} {1 \over 2\epsilon} \int_{f_1}^{f_2} \vert s(f+\epsilon) - s(u-\epsilon) \vert ^2 du.$$ This succeeds in defining $S$ almost everywhere.
Now even if $S$ is not differentiable, it does define a distribution, and its derivative in the sense of a distribution can be defined as the power spectral density. But since $S$ can have jump discontinuities, its derivative can have delta functions in it.
=== The case of a stochastic process
Suppose now that $X(t)$ is a stochastic process. We must further assume that it is stationary (in the wide sense)---this assumption is analogous to assumption $(*)$ above for a deterministic signal. Then $X$ has a spectral decomposition, which is a rather sophisticated analogue of the Fourier transform of a deterministic function. It uses the notion of stochastic integration which is much more elementary than Ito's notion of a stochastic integral. See Gnedenko, Kyrc Torpii Vepoyath¿nocteu, Chapter 10, No. 56, ctp. 316. Provided these are understood in the sense of the limit in mean of stochastic processes, one can write a spectral decomposition of $X$ entirely analogous to $(**)$:
$$X(t) = \int_0^\infty \cos \omega t dZ_1(\omega) + \int_0^\infty \sin \omega t dZ_1(\omega) ,$$
where $$Z_1(t) = \lim_{T\rightarrow\infty} {1\over 2\pi} \int_{-T}^T X(t) {\sin \omega t\over t} dt$$
and
$$Z_2(t) = \lim_{T\rightarrow\infty} {1\over 2\pi} \int_{-T}^T X(t) {1-\cos \omega t\over t} dt.$$
Now here, too, if the process is ergodic so that time averages can be replaced by ensemble averaging by taking the expectation operator $\bf E$, then the average power or variance contributed by the frequency $f$ can be found by looking at the expected value of the jump of $Z_i$ at $f$, i.e., studying $\bf E( \vert Z_i(\omega + \Delta\omega) - Z_i(\omega)\vert ^2 )$ etc. But at this point one bails and uses a theorem of Bochner, as generalised by Khinchin to the context of stochastic processes, and sees that this is equal to $F(\omega + \Delta\omega) - F(\omega)$ where $F$ is the statistical distribution function given by Bochner's theorem applied to the auto-correlation function of the process $X$.
==Now, as to the formula itself==
The formula you quote,
Sxx(ω):=limT→∞E[|1T−−√∫T0x(t)e−iωtdt|2],
is not correct. I have never seen a reliable source that proves (or even asserts) that it converges. I see it a lot on the internet and in engineering textbooks: they never bother to assert that it converges. I computed an example for a line spectrum, it does not converge---admittedly, it should not converge since a line spectrum does not have a power spectral density.
Consider the right hand side without the expectation operator, as if for a deterministic signal. Then it does not converge even when the spectral density exists, since the sample paths of a noisy process have unbounded variation on any finite interval whatsoever. As far as I know, one must introduce a lag window factor to make it converge, i.e., something like Cesaro summation but for an integral instead of a series.
This topic is fraught with peril: a signal contaminated with noise is modelled by a function which is continuous but nowhere differentiable and with unbounded variation on any finite interval, so Fourier inversion never is valid. More generally, because of the nature of these signals, one can never be sure it is valid to interchange two limits.
One often hears hand-waving assertions to the effect that the use of Laurent Schwartz's method of distributions makes these formulas all right. But even with distributions, one still has to convolve with a lag window or a spectral window to make it converge. I have never seen proofs of these handwaving assertions, and the only careful statements I know (without proofs, but it is after all a handbook which omits the proofs), D. C. Champeney, A handook of Fourier theorems, Cambridge Univ. Press, does not treat stochastic processes.
Best Answer
It holds for weakly stationary stochastic process. For MIMO linear systems, $y(t) = h(t) * x(t)$, we still have $$R_y(t) = h(t) * R_x(t) * h^T(-t)$$ or $$R_y(f) = h(f) R_x(f) h^T(-f).$$
For details, refer to Power spectral density of the system output.