I'm a math graduate student and I have to use time series in my thesis. I have not so much knowledge in statistics, but I've studied about probability and time series. So my question maybe can be very simple for an statistician and it is: why we can suppose the white noises normally distributed? In general, when we can suppose normal distribution about a random variable? Is there any approximation theory in statistics that support this assumption (CLT?)? Thanks!
Time Series – Analyzing Distribution of White Noise
normal distributiontime serieswhite noise
Related Solutions
Question1: If the power spectrum is not flat, then does that mean the colored noises are correlated?
One way to construct the power spectral density is to take the Fourier transform of the autocovariance function. This result is called the Wiener–Khinchin theorem. According to the theorem the autocovariance function is
$$ R_{xx}(t) = \int_{-\infty}^{\infty} = S(f)e^{2\pi itf} df, $$
where $S(f)$ is the power spectral density. The expression $e^{2\pi itf}$ describes rotations around a circle at frequency $f$. Integrating the area this rotation traces out turns out to be 0, which implies that the autocovariance function is flat at 0 only if $S(f)$ is flat. In order to compute the autocorrelation function you have to normalise by the variance of the noise, i.e. $R_{xx}(t)/R_{xx}(0)$.
To examine autocorrelation functions for different colours of noise I simulate 1000 timeseries of length 1000 each:
As can be seen, all colours of noise apart from white are autocorrelated for $t=1$. However, the correlation for blue and violet noise is negative and for larger $t$'s violet noise is not correlated. This is because violet noise is constructed by differentiating white noise, which means that every sample is the difference between two Gaussian random numbers. If we call these Gaussian numbers $w_1,w_2,\dots$ and the samples of violet noise $v_1,v_2,\dots$ we can write
$$v_1 = w_1 - w_2, v_2 = w_2 - w_3, v_3 = w_3 - w_4,\dots$$.
Note that the Gaussian random number $w_2$ is part of $v_1$ and $v_2$ with opposite signs (which explains the negative correlation) but not of $v_3$ (which explains that it is not correlated for $t>1$).
To illustrate this further I have also plotted the partial autocorrelation, which is the autocorrelation when controlling for all smaller $t$'s. Here violet noise is partially autocorrelated throughout. The reason is that the estimate of $v_3$ from $v_2$ can be improved by also knowing $v_1$. This is because $v_1$ contains in formation about $w_2$, which can be used to more accurately estimate the $w_3$ part of $v_2$, which in turn is needed to estimate $v_3$. (Note that the partial autocorrelation for violet noise would be flat if $t=1$ were not included.)
Another interesting observation is that the partial autocorrelation for brown noise beyond $t=1$ is flat at 0. This is because brown noise is obtained by integrating white noise, which makes every sample the cumulative sum of all previous Gaussian random numbers. The sample at $t=1$ is not crucial for estimating the current sample (thus the high autocorrelation for $t>1$) but if the sample at $t=1$ is known, then any sample $t>1$ does not add any new information (thus the low partial autocorrelation for $t>1$).
Pink and blue noise can be considered cases between these two extremes.
As a final aside, the autocorrelation for an infinite timeseries of brown noise is flat at 1 because the autocovariance $R_{xx}(t)$ and the variance $R_{xx}(0)$ are approaching infinity at the same rate.
Question2: The plot below is the time series which is the output of the sum of white Gaussian noise and pink noise. I don't know if pink noise is correlated or not. In general, does the addition of an uncorrelated r.v with a correlated r.v give an output that is correlated or uncorrelated? i.e., z=x+y is z correlated/uncorrelated?
As you can see from the above plot pink noise is autocorrelated. You can obtain this result yourself from your simulated data by shifting its values by different lags $t$ and computing the correlation for each lag. Generally, if you add a correlated timeseries to an uncorrelated one the resulting timeseries will be correlated but less so.
Firstly, I am not aware of any situation where someone would be crazy enough to use a time-series model where the noise terms are marginally Gaussian but not jointly Gaussian. So while it is true that one can construct uncorrelated but not independent Gaussian terms, these are not used in time-series modelling. In time-series modelling, if we use Gaussian noise at all, we will always assume that the noise sequence is Gaussian (i.e., each finite vector of noise terms has a multivariate Gaussian distribution). Since this distribution is fully defined by its first two moments, uncorrelated noise is identical to IID noise in this case. Consequently, your (1) collapses down to (2) in practice. As to the remaining distinctions in your taxonomy, as in regression modelling (see related answer here), it is possible to obtain certain useful results for ARIMA models without specifying a noise distribution (i.e., just specifying the first two moments of the noise) but it is possible to get more results by specifying the full distribution.
As you correctly point out, often the specification of time-series models is a bit sparse, and various things are often not specified explicitly. (Time-series texts are actually quite notorious for this.) By convention, unless there are contextual cues or direct specification to the contrary, you should assume that the model is jointly Gaussian if it uses any methods that require specification of the full distribution. So any time you see maximum likelihood estimation (MLE) used in an ARIMA model, without explicit specification of a noise distribution, you can reasonably take this to mean that the assumed noise sequence is Gaussian (i.e., all finite vectors of noise elements have a multivariate Gaussian distribution).
Best Answer
As already noted, white noise does not need to be normally distributed and I would assume that you find deviations from normality in any noise if you only look hard enough (i.e., take sufficiently many samples).
However, in reality a random variable can often be regarded as the sum of many smaller random variables. For example, if we measure the speed of birds flying past a window (and consider it as a random variable), we can break it down into the following:
Now, the distributions of some of these influences may not be normal, e.g., if we have two dominant bird species of different sizes, the first one will likely be bimodal. But if we have sufficiently many such influences contributing to our measurement, the Central Limit Theorem yields that the distribution of our summed observable tends to be close to normality.
Due to this, we encounter many approximately normally distributed variables in application and thus normality is a good default assumption. Moreover, in many cases where we strongly deviate from normality (e.g., the bimodal bird fitness distribution from above), it’s no intellectual challenge to predict this – which further reduces the risk of errors due to falsely assuming normality.
Sidenote: This is probably the most important consequence of the Central Limit Theorem, which is sadly rarely mentioned in education. For example, standard error propagation relies on the assumption of normality and thus on the Central Limit Theorem.