Machine Learning – Clarifications on I.I.D. Assumption

iidmachine learningmathematical-statisticsmodeling

In this question, it was stated that the assumption of i.i.d. for data comes in the form of
$$(X_i,y_i)∼P(X,y),∀i=1,…,N \\(X_i,y_i) \;independent\; of \;(X_j,y_j),\;∀i≠j∈{1,…,N}
$$

I am clear with the definition of i.i.d. and its concepts, however it is still rather unclear to me how this assumption is applicable.

To illustrate my confusion with an example, say we are looking at a classification problem, where $X$ is the input feature and $y$ is the label.

When we generate $n$ samples for training, I would think of it as drawing $(X_i,y_i)$ from the joint distribution of $X$ and $y$. How is the concept of independent and identical distribution relevant here then? Aren't $(X_i,y_i),\; for \; i =0,…,n$ all being drawing from the same distribution of $X$ and $y$.

Best Answer

The details are discussed in the On the importance of the i.i.d. assumption in statistical learning thread, but answering your question: the $n$ samples your observed are considered as random variables. So all the $(X_i, y_i)$ pairs are thought as $n$ random variables. Only random variables can be independent or have probability distributions, so if they are "independent and identically distributed", we must be talking about random variables. To be able to think about your data in probabilistic terms, you need to think of them as random variables.