The advantage of using PCA in the frequency domain is to choose a set of weights by exploiting the cross-correlations between the signals at particular cycles.
For example, (depending on the field of application) the behaviour of the variables under study can be different in the short, medium and long run. Using PCA in the frequency domain will allow to choose weights depending on the frequency.
The difference between PCA in the time domain and frequency domain can be understood in terms of how the eigenvalues are computed. In time domain, the correlation matrix is used. In the frequency domain, the fourier transform of the correlation matrix or the spectral density matrix is used to compute the eigenvalues.
For technical applications of using PCA in frequency domain, there is a description in book by Jolliffe,I.T(2002), Principal Component Analysis, 2nd Edition. Here is a link to the relevant page.
Regarding your second question, I have understood PCA by itself to be a method of finding combinations of variables which extract the maximum information in the data by maximizing the variance of the principal components. Therefore, it does not seem to be dealing with any cyclic or frequency information in the data.
I'd like to informally try to approach a few of these.
1) Are spectral decompositions useful for modeling/forecasting, or are they typically used only for analysis purposes.
1A) When modelling, I use the spectrum to give information about the seasonal components of my data. Simplistically, I might consider a model of the form:
$$
x_{t} = m_{t} + \sum_{i=1}^{S} s_{t}^{(i)} + Y_{t}
$$
Where you would have a mean function ($m_{t}$), $S$ seasonal components (sinusoids) ($s_{t}^{(i)}$), and a zero-mean random process $Y_{t}$.
I use the spectrum to estimate the seasonal components amplitudes and phases and then an ARMA (ARIMA?) to model $Y_{t}$.
2) Are the forecast of spectral decompositions always some repeated periodic series?
2A) As far as I'm aware, yes. The motivation for the theory makes the assumption that the process of interest is a discrete parameter stochastic process of the form:
$$
X_{t} = \sum_{l=1}^{L} D_{l}\cos(2\pi f_{l}t + \phi_{l})
$$
We let $L \rightarrow \infty$ in a "nice" way.
I believe we would also say, plus noise?
This is on page 127 of Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques by Percival and Walden.
The only non-sinusoidal part is at $f = 0$.
3) Would using a seasonal ARIMA likely outperform (in terms of forecasting) a spectral decomposition, even with a ARIMA model on the residuals of the spectral model? (assuming data with strong seasonal/periodic trends)
3A) My intuition is that I would be doubtful that the ARIMA would perform better than spectral decomposition, however without any concrete proof. The reasoning is that you should get a much better estimate of the frequencies of interest from a spectral decomposition. I'd like to reiterate: I'm not sure though.
I'm not too sure about 4), again my intuition would be that you would need to recalculate the spectrum using the new data as opposed to being able to update the existing spectrum.
Best Answer
It all boils down on how would you want to process a time series as to breakdown its components as to use these for later prediction or classification.
For one ARIMA is a parametric method (assumption of a fixed distribution) modeling a stationary time series based on static ARMA terms while with wavelets you model a wavelet function by selecting a list of characterictis you want the function to have as to best approximate a signal (wavelets can model a non stationary as well as stationary). In wavelets the length of the filter the number of vanishing moments and the symetry of the mother wavelet vs the signal will define how good the function is in providing representation of time resolution and frequency resolution. (existing a trade off between this two terms as you get shorter filters you get better in time resolution while decrease the frequency resolution.
In ARIMA you aproximate the signal components by selecting ARMA terms from acf and pacf while for wavelets you aproximate the signal by selecting a mother wavelet, filter length and number of vanishing moments. (the number of decomposition levels provides capturing certain frequencies per level). So an stationary ARMA model would have its pseudo equivalent for a non stationary signal by performing a wavelet transform only at one scale obtaining scaling and detail coefficients (sort of equivalent to having an ARMA with dynamic terms). The results for the two methods will be different, for as in Arima you have static ARMA terms across time, in wavelet decomposition you will have frequency ranges across time which will not correspond to the ARMA terms, although you could calculate what are the frequencies that the ARMA terms represent and approximate these by using wavelets, the dynamic approach of the wavelet will ensure that there is no one specific frequency across time where the ARMA terms will be represented).
In modeling a time series the first issue is to define the characteristics of the signal and the second is in knowing in what part of the signal we are most interested in. (modeling the entire signal or extracting a portion of it).
Which modeling method will provide a better representation of the signal or portions of it you are interested in, depends on the ability of the modeler, the properties of the signal (how easy is to model it) Sometimes an stockastic model such us ARMA or other modeling method model will be sufficient, other times using a built in transform by selecting carefully its terms will give better representation.