Solved – How to the IID assumption be checked in a given dataset

autocorrelationdistributionstime series

1- How can I check if a set of data can be assumed as IID data?
I'm not so familiar with statistics, but I guess I should look at the first lag of autocorrelation for independent distribution. Have no idea about identical distribution condition!

2- It seems that I was not clear enough!
I'm trying to detect outliers in a series of records (turbulent flow velocity in a river). I transform data into wavelet space and then I shrink the wavelets over a certain threshold. Since standard deviation is the worst option as an scale estimator, I looked for a new estimator.
Rousseeuw and Croux developed new robust estimators for measuring dispersion in iid random variables, Sn and Qn.
I don't know offhand if the high breakdown properties they enjoy carry over to the time-series case or not.

From the answer given by kwak, I can infer that wavelets do NOT follow independent distribution property. Since after shrinkage, location of non-zero elements indicates the spike location in the original time series. Am I true? (shuffling the indices results in losing location of spikes) If so, other scale estimators like median absolute deviation (MAD) are not valid in case of time series as we calculate the median.

How about identical distribution assumption requirements?

3- OK, let me ask my question in simple manner:
I want to use robust scale estimators Sn and Qn for shrinking a series of wavelets. the wavelets are obtained from decomposing observations of a turbulent flow field velocity vectors collected at 1 Hz sampling rate.
if the data can be assumed as iid e.g. Qn has breakpoint of 50% and efficiency of 82% (Gaussian distribution).
My question is the high breakdown properties they enjoy carry over to the time-series case or not.
Or how can i approve that the wavelets follow iid characteristics.

Best Answer

You don't frame the two problems the right way.

Given a random dataset, ie a collection of observations $x_{ij}$ lying in general position you can always make the $n$ $x_{i}\in\mathbb{R}^p$ independent from one another by randomly shuffling the $n$ indexes. The real question is whether you will lose information doing this. In some context you will (times series, panel data, cluster analysis, functional analysis,...) in others you won't. That's for the first I in IID.

The 'ID' is also defined with respect to what you mean by distribution. Any mixture of distribution is also a distribution. Most often, 'ID' is a portmanteau term for 'unimodal'.

Related Question