Solved – Help needed with intuition of eigenvalue spectrum of correlation matrix

correlation matrixeigenvaluestime series

I wish to get a better understanding of the meaning of the eigenvalues of a correlation matrix I am studying.

I have a correlation matrix of noise levels for 10 cells in a wireless network over time. Let's say I have 1000 time series points for each variable, I then normalise the time series points using a sliding window approach by subtracting the mean of the window and divide by the S.D before I create 1000 10×10 correlation matrices.

I have then calculated the eigenvalues for the each of the 4 different sized windows I have used and using the original index I can identify periods when I would expect the greatest amount of noise in the original time series actually have some of the smaller eigenvalue magnitudes.

What do these eigenvalues actually represent, I've read that the first eigenvalues explain the maximum amount of variance of the variables which can be accounted for with a linear model by a single underlying factor . But this means nothing to me right now, I understand variance and how it is a measure of spread but what a linear model is but I can't piece the two parts together intuitively.

Here is a sample plot of four variables in the original time series:

And here is a sample of the largest eigenvalues using 4 different sized windows for a 10×10 matrix

What is the translation from the time series domain to the eigenvalue spectrum?

Best Answer

Your signals go up and down with a time period very close to a day (1440 minutes or 96 times a 15 minute time period, so roughly a period of 100 on your first graph).

This translates to variations in signals that strongly correlate around the same 'day'-variable and explains the high value for the first eigenvector.

This 'day' variable has the strongest effect on the 1500 min window. And less for smaller time windows. In the case of the 1500 min window the daily variation is measured at the maximum scale. Other noise that possibly correlates in (some of) the 10 cells, may not be present for such long time and become less important in the 1500 min window.

For example, the noise variation due to the on/off switching of the coffee machine (or something else of short duration, e.g. signals during lunchtime) is less well measured if you sum everything in a time window that takes an entire day.
An example of the opposite. The noise variation due to daily changing activity in noise is not well measured if you sum only in a time window that takes a small fraction of the day and does not cover the daily variation.

The 1500 min window is a smoothly varying function which represents the (relative) impact of the daily noise on that particular day (as a moving average). It seems to vary between 5 and 7 or 50% and 70%. So on some days the 'total' noise was influenced by the 'daily' noise for 50/70%. Note that this compares the day-variable relatively to the total amount of noise. The increase of the percentage may be due to either more noise from the day-variable or less noise from the other causes of noise.

The other time windows are highly changing functions. In fact they have a period very close to half a day. This reflects the impact of the daily variation being variable throughout the day. Say you have the 'rise' and 'fall' in the morning and evening (2 times a day, therefore a period of half a day), then in time windows centered around those moments the variation in noise levels of the signals will correlate the most. During the night and day the signals change less due to the daily variation and then other noise effects will have more impact.

edit: one modification that you might try is scaling your RTWP for each signal by the total RTWP (individually at each time-point). The resulting time-series represent the relative noise in the signal at that time. The correlation is now not influenced by the variation of the total noise which varies throughout the day (that type of daily correlation may be just due to correlation in the activity of multiple sources of interference, rather than correlation in the sensitivity of your multiple wireless cells to a single source of interference).

Best Answer

Related Solutions

Solved – Variance and autocorrelation with missing and/or unevenly spaced data in time series

Related Question