Solved – Practical difference between “sampling” and “re-sampling with replacement”

distributionsprobabilityresamplingsamplingweighted-sampling

What is the difference between "sampling" and "re-sampling with replacement" from a practical point of view (I mean how do I code that) ?

When one say: "sampling $\tilde{x}_t^{(i)}$ from $p(x_t | x_{t-1}^{(i)})$", this mean that we generate $\tilde{x}_t^{(i)}$ from the conditional probability distribution $p(x_t | x_{t-1}^{(i)})$, so this is the next random value to generate from a uniform or a gaussian distribution for example, right ?

Now, when one say: "re-sampling with replacement N values $\{x_t^{(i)}; i=1,\dots,N\}$ from the set $\{\tilde{x}_t^{(i)}; i=1,\dots,N\}$ according to the associated weights/probabilities $\{w_i; i=1,\dots,N\}$". Does this mean that we choose each value $\tilde{x}_t^{(i)}$ according to the probability $w_i$, and repeat this until we get N values ? So how do you do that practically in a pseudo code ?

EDIT:
I understand that re-sampling here means reselecting the values from the weighted set, by choose randomly, but according to the weights. So if we had three values, with weights 0.99, 0.005 and 0.005, then the most likely outcome of resampling would be three values of the first type. But how do I code this in a general case ?

Best Answer

I am going to try to expand my previous comment and correct some notation (let me know if I am mistaken). (Thanks for the suggestion @Macro)

1. In this case you need to know explicitely what $p(x_j\vert x_{j-1})$ means (Note that here $x_{j-1}$ is a known value and this distribution depends on this value somehow). An example of this is

$$x_j\sim \mbox{Normal}(x_{j−1},1).$$

In this case, in order to sample from $x_j$, you just need to sample from the a normal distribution with mean $x_{j−1}$ and variance $1$. For other examples you can check the wikipedia page for Markov chains or check other transition kernels.

2. A pseudocode for obtaining a sample of size $M$ with replacement is as follows

Starting with a known sample $\{x_i,i=1,...,N\}$, repeat $M$ times

(i). Consider partitioning the interval $(0,1)$ into $N$ subintervals $I_1=(0,w_1)$, $I_2=(w_1,w_1+w_2)$,..., $I_N = (\sum_{j=1}^{N-1}w_j,1)$.

(ii). Simulate $u\sim U(0,1)$.

(iii). Identify the interval $I_j$, $j\in\{1,...,N\}$, such that $u\in I_j$.

(iv). Take the sample $x_j$.

As you can see, there is a difference between these two methods.

In the first one you have a model, $p(x_{j}|x_{j−1})$ , and you simulate from it. Starting from, say $x_0$, the next value $x_1$ is sampled from $p(x_1|x_0)$; then the second value $x_2$ is sampled from $p(x_2|x_1)$ and so forth. For this purpose you need to be able to simulate from the conditional distributions $p(\cdot\vert\cdot)$. This is, you obtain a sample $\{x_i,i=1,...,N\}$

In the second sampling method (re-sampling) you already have a sample $\{x_i,i=1,...,N\}$ and you obtain a new sample by picking elements from the original one. This is similar to the 'urn game'. Imagine you have an urn with $N$ elements of different sizes (illustrating the different weights), you pick one of them, record the result and put it back in the urn. This experiment is repeated $M$ times.

I hope this helps.

Related Question