Prewhitening – Why is Prewhitening Important in Time Series Analysis?

mathematical-statisticstime series

I am writing code, geophysical time series processing. First step is to prewhiten values in time domain. Why is this step important?

For example, I have found this on sas.com

If, as is usually the case, an input series is autocorrelated, the direct cross-correlation function between the input and response series gives a misleading indication of the relation between the input and response series.

I do not understand, in my case all values are E field measurements over time. What means that input series is autocorrelated?

How will it influence Fourier transform on the next step?

Best Answer

The reason that you pre-whiten X is to identify a filter that can transform Y and X into y and x where x is white noise i.e. serially independent or free of autocorrelation in order to IDENTIFY an appropriate model. Note that one filter (ARMA developed on X ) is used on both the Y and X. Now with y and x you can form/identify a potential relationship which is then applied to the Y and X to construct/identify a polynomial distributed lag model (PDL/ADL/DGF . Fundamentally you are adjusting the Y and X ( transforming/filtering) so that the resultant cross-correlation between y and x (proxies) can be correctly/efficiently be interpreted and used on the observed series Y and X.

The single filter doesn't distort the causative structure. Note that differencing operators required for X and Y are not necessarily the same and are not necessarily part of the final model relating Y and X.

To further numerically illustrate this consider the GASX problem from the Box-Jenkins text where PINK reflects the predictor series enter image description here . A simple filter (2,1,0) was used to prewhiten creating "adjusted cross-correlations or prewhitened cross-correlations" enter image description here suggesting/identifying a three period delay culminating in this useful equation enter image description here . Note clearly that Y is not CONDITIONALLY a function of X contemporarily (or lag 1 or lag 2) given the model form. In simpler terms X significantly affects Y after two periods and not before.

In contrast consider the simple (naive) cross-correlation between Y and X falsely suggesting structure (induced by the auto-correlation within the series ) enter image description here.

It is interesting to me that most significant of these cross-correlations are at lags 3,4 and 5 illustrating that however flawed/contaminated they can still be directionally important.