This is in the context of two random variables. A frequent assumption (e.g. of the error term in ANOVA) is of independent and identically distributed random variables. There is a question on this site asking how the assumption can be checked in a given dataset. Is it an assumption or a fact?

# Independence Assumptions – Is ‘Independent and Identically Distributed’ (IID) a Fact or Assumption?

assumptionsiidindependence

#### Related Solutions

First things first. There needs to be greater information given as this does not have a universally correct answer. Different types of distributions have to be looked at with different types of procedures.

But just to show that yes this is possible, we assume that each of the variables that you have mentioned are normally distributed but the parameters of the normal distributions are different from each other for any given pair.

Now we take n samples each of these variables. Then calculate the correlation coefficients for each pair of the variables. If we cannot reject the hypothesis of these correlation coefficients being zero, we hypothesize that the variables are independent of each other. So we have a set of variables which are independent from each other, but they have different probability distributions.

The operational meaning of the IID condition is given by the celebrated "representation theorem" of Bruno de Finetti (which, in my humble opinion, is one of the greatest innovations of probability theory ever discovered). According to this brilliant theorem, if we have a sequence $\mathbf{X}=(X_1,X_2,X_3,...)$ with empirical distribution $F_\mathbf{x}$, if the values in the sequence are *exchangeable* then we have:

$$X_1,X_2,X_3, ... | F_\mathbf{x} \sim \text{IID } F_\mathbf{x}.$$

This means that the condition of *exchangeability* of an infinite sequence of values is the operational condition required for the values to be independent and identically distributed (conditional on some underlying distribution function). The theorem can be applied in both Bayesian and classical statistics (see O'Neill 2009 for further discussion), and in the latter case, the empirical distribution is treated as an "unknown constant" and so we usually drop the conditioning notation. Among other things, this theorem clarifies the requirement for "repeated trials" in the frequentist definition of probability.

As with many other probabilistic results, the "representation theorem" actually refers to a class of theorems that apply in various different cases. You can find a good summary of the various representation theorems in Kingman (1978) and Ressel (1985). The original version, due to de Finetti, established this correspondence only for binary sequences of values. This was later extended to the more general version that is the most commonly used (and corresponds to the version shown above), by Hewitt and Savage (1955). This latter representation theorem is sometimes called the de Finetti-Hewitt-Savage theorem, since it is their extension that gives the full power of the theorem. There is another useful extension by Diaconis and Freedman (1980) that establishes a representation theorem for cases of *finite exchangeability* --- roughly speaking, in this case the values are "almost IID" in the sense that there is a bounded difference in probabilities from the actual probabilities and an IID approximation.

As the other answers on this thread point out, the IID condition has various advantages in terms of mathematical convenience and simplicity. While I do not see that as a justification of realism, it is certainly an ancillary benefit of this model structure, and it speaks to the importance of the representation theorems. These theorems give an operational grounding for the IID model, and show that it is sufficient to assume exchangeability of an infinite sequence to obtain this model. Thus, in practice, if you want to know if a sequence of values is IID, all you need to do is ask yourself, "If I took any finite set of values from this sequence, would their probability measure change if I were to change the order of those values?" If the answer is no, then you have an exchangeable sequence, and hence, the IID condition is met.

## Best Answer

In practice being independent and identically distributed is an assumption; it may sometimes be a good approximation, but it's next to impossible to demonstrate that it actually holds.

Generally, the best you can do is show that it doesn't fail too badly.

This is what diagnostics, and to some extent hypothesis tests attempt to do. For example, if someone looks at an ACF of residuals (for data observed in sequence) to see if there's any obvious serial correlation (which would mean that independence didn't hold) ... but having small sample correlations doesn't imply independence.

[If you're trying to assess assumptions for some statistical procedure -- or especially if you're trying to choose between possible procedures -- it's generally best to avoid hypothesis tests for that purpose. Hypothesis tests don't answer the question you really need an answer to for such a purpose, and using the data to choose in that manner will impact the properties of whichever later procedure you choose. If you must test something like that, avoid testing the data you're running the main test on.]