The Wikipedia article is still under construction, and still contains errors.
I am one of the editors.
The formula you cite is from the section of the article about the power spectral density of a stochastic process, and is rather sloppy, it still needs to be corrected.
But the wordy definition you cite is from a different paragraph of the article, and applies first of all to an individual signal, i.e., a deterministic signal, i.e., a sample function of the process, ignoring the existence of all other sample functions, thus, ignoring the structure of the process. Secondly, it applies to a process too but only to the spectral decomposition of the process, and not to the formula you mention.
Now, the truth is this: given any function of time (a deterministic function of time) x(t),
such that $$\lim_{T\rightarrow\infty} {1\over 2T} \int_{-T}^T x(t+\tau)x(t) dt \,\,\,\,\,\ \ \ \ \ \ \ (*)$$ exists for all $\tau$, then one can find a statistical distribution function $S$, called the power spectral distibution function of $x$, such that for almost all frequencies $f_1,f_2$,
$S(f_2)-S(f_1)$ is the amount of power contributed to $x$ by frequencies in the band $[f_1,f_2]$ in the sense of the sum of the squares of the jumps at frequencies in that band of $s$, the generalised Fourier transform of $x$, defined by the limit in mean (i.e, the limit in an $L^2$ space, not a pointwise limit) of
$$s(\omega) = \int_{-A}^{-1} x(t) {e^{-i\omega t}\over it} dt + \int_{-1}^{1} x(t) {e^{-i\omega t}-1\over it} dt + \int^{A}_{1} x(t) {e^{-i\omega t}\over it} dt \,\,\,\,\,\ \ \ (**)$$ as $A$ goes to infinity, with $\omega = 2\pi f$.
The first tricky bit is that $x$ will not usually have a Fourier transform, which is why we have to put a factor of $t$ in the denominator here, for convergence. If only $x$ had a Fourier transform $X$, this generalisation, $s$, would be the integral of $X$.
The second tricky bit is that even if $s$ is continuous, it might be so far away from being differentiable that its "infinitesimal" jumps contribute something to the power. For this reason, the intuitive notion of "sum of squares of the Fourier coefficients of $x$ " has to be interpreted as the sum of the squares of the jumps of $s$" which, in turn, has to be interpreted as $$\lim_{\epsilon\rightarrow0} {1 \over 2\epsilon} \int_{f_1}^{f_2} \vert s(f+\epsilon) - s(u-\epsilon) \vert ^2 du.$$ This succeeds in defining $S$ almost everywhere.
Now even if $S$ is not differentiable, it does define a distribution, and its derivative in the sense of a distribution can be defined as the power spectral density. But since $S$ can have jump discontinuities, its derivative can have delta functions in it.
=== The case of a stochastic process
Suppose now that $X(t)$ is a stochastic process. We must further assume that it is stationary (in the wide sense)---this assumption is analogous to assumption $(*)$ above for a deterministic signal. Then $X$ has a spectral decomposition, which is a rather sophisticated analogue of the Fourier transform of a deterministic function. It uses the notion of stochastic integration which is much more elementary than Ito's notion of a stochastic integral. See Gnedenko, Kyrc Torpii Vepoyath¿nocteu, Chapter 10, No. 56, ctp. 316. Provided these are understood in the sense of the limit in mean of stochastic processes, one can write a spectral decomposition of $X$ entirely analogous to $(**)$:
$$X(t) = \int_0^\infty \cos \omega t dZ_1(\omega) + \int_0^\infty \sin \omega t dZ_1(\omega) ,$$
where $$Z_1(t) = \lim_{T\rightarrow\infty} {1\over 2\pi} \int_{-T}^T X(t) {\sin \omega t\over t} dt$$
and
$$Z_2(t) = \lim_{T\rightarrow\infty} {1\over 2\pi} \int_{-T}^T X(t) {1-\cos \omega t\over t} dt.$$
Now here, too, if the process is ergodic so that time averages can be replaced by ensemble averaging by taking the expectation operator $\bf E$, then the average power or variance contributed by the frequency $f$ can be found by looking at the expected value of the jump of $Z_i$ at $f$, i.e., studying $\bf E( \vert Z_i(\omega + \Delta\omega) - Z_i(\omega)\vert ^2 )$ etc. But at this point one bails and uses a theorem of Bochner, as generalised by Khinchin to the context of stochastic processes, and sees that this is equal to $F(\omega + \Delta\omega) - F(\omega)$ where $F$ is the statistical distribution function given by Bochner's theorem applied to the auto-correlation function of the process $X$.
==Now, as to the formula itself==
The formula you quote,
Sxx(ω):=limT→∞E[|1T−−√∫T0x(t)e−iωtdt|2],
is not correct. I have never seen a reliable source that proves (or even asserts) that it converges. I see it a lot on the internet and in engineering textbooks: they never bother to assert that it converges. I computed an example for a line spectrum, it does not converge---admittedly, it should not converge since a line spectrum does not have a power spectral density.
Consider the right hand side without the expectation operator, as if for a deterministic signal. Then it does not converge even when the spectral density exists, since the sample paths of a noisy process have unbounded variation on any finite interval whatsoever. As far as I know, one must introduce a lag window factor to make it converge, i.e., something like Cesaro summation but for an integral instead of a series.
This topic is fraught with peril: a signal contaminated with noise is modelled by a function which is continuous but nowhere differentiable and with unbounded variation on any finite interval, so Fourier inversion never is valid. More generally, because of the nature of these signals, one can never be sure it is valid to interchange two limits.
One often hears hand-waving assertions to the effect that the use of Laurent Schwartz's method of distributions makes these formulas all right. But even with distributions, one still has to convolve with a lag window or a spectral window to make it converge. I have never seen proofs of these handwaving assertions, and the only careful statements I know (without proofs, but it is after all a handbook which omits the proofs), D. C. Champeney, A handook of Fourier theorems, Cambridge Univ. Press, does not treat stochastic processes.
The Fourier transform as defined by the integral $\int_{-\infty}^{\infty}f(x) e^{-iux}dx$ exists if and only if $f$ is absolutely integrable.
However, the Fourier transform can be defined in a sensible way for functions not meeting this requirement. For example, the Fourier transform can be extended to functions which are in $L^2$ but not $L^1$ using a limiting argument.
And, as applicable to this question, the Fourier transform can also be extended to the class of "tempered distributions," also sometimes called generalized functions. Indeed, the Fourier transform is a bijective operator on this space, meaning that the Fourier transform of a tempered distribution is another tempered distribution, and the transform can be inverted. Objects such as the $\delta$ distribution are examples of tempered distributions which are not functions.
There is a fair amount of machinery which needs to be developed in order to make this rigorous. See, for example, Stein and Shakarchi's Functional Analysis or, at a more informal level, Strichartz's Guide to Distribution Theory and Fourier Transforms.
Best Answer
There is a fundamental mistake in both answers. It is not necessary to assume the autocovariance function is integrable. For a non-ergodic stationary process, it will not be integrable. The reason why Wiener gets credit for this theorem, instead of physicists like Schuster or Einstein, is that he was able to rigorously make sense of its Fourier transform anyway, in a new way, which he called «Generalised Harmonic Analysis», instead of the usual notion of the Fourier transform as given by the integral you write down. (In fact, he even anticipated Laurent Schwartz's notion of a distribution in his work on this.) So the Wiener-Khintchine theorem states that as long as the original process $f$ is stationary and has an auto-covariance function at all, then in this new sense of Fourier transform (which works even when the Dirichlet conditions are not satisfied), the power spectral density function (which can have infinities since it is the derivative of a function which is not differentiable, and so only makes rigorous sense as a distribution) is the Fourier transform of the auto-covariance function.
==About power and finite energy signals==
If the signal has finite energy, the power is zero, as follows from your formulas below.
But only a transient signal can have finite energy. The probability of sampling a transient signal from a stationary process is zero. Transient is the exact complete opposite of stationary.
A simple unit square wave for one-cycle only has finite energy, but zero power, and as you can calculate its sample auto-covariance function easily, you see it is zero. It has to be, since the Wiener-Khintchine theorem says the Fourier transform of the auto-covariance is the power spectral density and we just saw the power is zero.
Summarizing: finite energy (which means transient) ==> zero power ==> zero sample auto-covariance function.