Does the marginal distribution assume independence? (i.e. sampling without replacement)

distributionsindependencerandom variable

Let $X_1, X_2, … , X_n$ be a sample drawn without replacement from a finite population. $X_1$ may be the random variable – weight of the first person; $X_2$ may be the random variable – weight of the second person, etc. Then the random variables $X_1, X_2, … , X_n$ are not mutually independent, but they are still said to be identically distributed. I can't make sense of why we'd think of $X_1, X_2, … , X_n$ as identically distributed. It seems to defy intuition. Intuitively, $X_1, X_2, … , X_n$ don't even necessarily have the same sample space, but in some sense this intuition must be incorrect. In what sense are $X_1, X_2, … , X_n$ identically distributed (have the same marginal distribution)?

Let's look at the Wikipedia definition of marginal distribution.

In probability theory and statistics, the marginal distribution of a
subset of a collection of random variables is the probability
distribution of the variables contained in the subset. It gives the
probabilities of various values of the variables in the subset without
reference to the values of the other variables. This contrasts with a
conditional distribution, which gives the probabilities contingent
upon the values of the other variables.

The key sentence here is "It gives the
probabilities of various values of the variables in the subset without
reference to the values of the other variables
." BUT, $X_2$ is defined with reference to the first draw. $X_2$ is the weight of the second person after we've drawn the first person and kept this first person out, making it unavailable to be drawn as the second person. It appears that the definition of marginal distribution assumes independent draws! And if that is true then it assumes sampling with replacement and is altogether useless (it seems) in this case as it appears to be perfectly contrary (assumes sampling with replacement) to the setup of this question (sampling without replacement)?

Best Answer

When I was an undergraduate, the professor in my probability class began each lecture by drawing two balls in succession (without replacement) from an urn that he brought to class. Some days, the first ball was white and the second ball black, while on other days, the first ball was black and the second ball was white. I noticed over the course of the semester that roughly half the time, the first ball was white and the second black, and half the time it was the other way around. So, I figured that the the probability that the first ball was white was $0.5$ and the probability that the second ball was white was also $0.5$.

A classmate of mine was always just a tad late coming to class and he observed only the second ball being drawn and he also noted that roughly half the time, the ball that our professor drew was white, and he too estimated the probability that the professor drew a white ball was $0.5$. He didn't know that the ball that our profossor was drawing as my friend walked in was the second ball that the professor was drawing from the urn. And yet, my friend and I came up with the same estimate of the probability of the (second) ball being white.

At the end of the semester, our professor invited the class to examine the urn. I was surprised to discover that the urn contained only one black ball and one white ball! That explained why the draws were always (white, black) or (black, white). By golly, those draws were dependent as heck but they both had the same marginal probability $0.5$ of resulting in a white ball both for me who saw both draws and for my classmate who didn't know that he was observing the result of the second draw from the urn.


More generally, in sampling without replacement from a population of $n$ distinct items, suppose that we are taking $k < n$ samples. Then the $k$ samples are all distinct. Unknown to us, God continues sampling without replacement until all $n$ items have been draw. God's experiment has $n!$ different outcomes each of which has probability $\dfrac{1}{n!}$. How many of these outcomes have item #i occurring in the $j$-th place? Well, God's experiment has $n!$ possible outcomes of which exactly $(n-1)!$ outcomes have item #i in the $j$-th place (and the $n-1$ outcomes #1, #2, $\ldots$, #(i-1), #(i+1), #(i+2), $\ldots$, #n scattered about in places $1, 2, \ldots, (i-1), (i+1), \ldots n$. So, at least in God's mind, the probability that item #i occurs in the $j$-th place is $\dfrac{(n-1)!}{n!} = \dfrac 1n$ regardless of what $j$ is. In God's mind, item #i has the same probability $\dfrac 1n$ of occurring in each of the $n$ places. To the extent that we all hope to know what is in God's mind, we should accept these calculations as correct, even though we stopped after $k$ draws and didn't complete the experiment by drawing all $n$ items and so didn't get to see what God obtained in draws numbered $k+1, k+2, \cdots, n$.

Note that the events that "item #i occurs in the $j$-th place"and "item #i occurs in the $j^\prime$-th place" are disjoint events (the cannot occur simultaneously), not independent events. Very dependent but nonetheless equally likely