Solved – Systematic Sampling with unequal probabilities

probabilityrsamplingsystematic

I found the library Sampling in R to conduct a systematic random sampling with unequal probabilities. I got my sample but I really don't know how that sample is selected. Is there anyone that can explain how a systematic sampling with unequal probabilities is selected?.

Here's my code:

# The inclusion probabilities are calculated proportional to the size ofthe variable #"TOTAL DOCENTES".    
pinclusru <- inclusionprobabilities(rural$TOTALDOCENTES,284)


#A systematic sampling is conducted with the inclusion probabilities    
    indmuestraurbano <- UPsystematic(pinclusur)

Best Answer

Systematic sampling is an algorithm that implements unequal (or equal) fixed size sampling with respect to a given vector of 1st order inclusion probabilities. Let me show you how it works with an example :

Let's say we want a sample from population $ \{A,B,C,D,E,F\} $, and our sampling design (fixed size 3) is :

$ \begin{align*} \pi_A &= \frac{1}{3} \\ \pi_B &= 1 \\ \pi_C &= \frac{1}{6} \\ \pi_D &= \frac{2}{3} \\ \pi_E &= \frac{1}{3} \\ \pi_F &= \frac{1}{2} \end{align*} $

(Note that : $ \sum_i \pi_i = 3 $).

To do this with systematic sampling, let's order our population along an x-axi, each individual of the population "owning" an interval which length is its inclusion probability :

systematic sampling 1

Then, we randomly draw u from a $ \mathcal{U}(0,1) $, and we select the units which intervals contain u, u+1 and u+2 For example, if we get $ u = 0.05 $, it gives :

Systematic sampling draw

And our sample is : $ \{A,B,D\}$

Please note that systematic sampling does not respect 2nd order inclusion probabilities of your sampling scheme, and real 2nd order inclusion probabilities are very hard to compute. In addition, some 2nd order inclusion probabilities might be equal to 0. Thus, traditional variance estimators don't apply in systematic sampling.

Systematic sampling is a very low entropy sampling scheme, and is often used on either a pre-shuffled population database or a pre-ordered population database (in which case it behaves like stratified sampling on the variable the database was ordered).