Solved – Stratified random sampling when strata overlap

samplingstratificationsurvey-sampling

I am sure the title might be confusing, but here is what I am dealing with:

I am running a survey at a health care center. The health center has around 15k active visitors. There are 5 departments operating in the center

Patients may (and in fact) visit more than one department.

First, we calculated the sample size needed for a level of precision and confidence level (typically 5% and 95% respectively) for an overall percentage of satisfied patients.

The problem I am facing is how to distribute the resulting sample size among the 5 departments. I was hoping for a stratified random sampling but clearly, the strata are overlapping.

At first, I thought of the capacity of each department with respect to, let's say, number of examinations, but I am not so sure about that.

I have the number of unique patients that visited each department in a given period of time (i.e. 6 months) and also the number of examinations or visits to each department. Again, some patients visit more than one department

How should I go about this issue?

  • Should I use some kind of stratified sampling?
  • In what way?

Any help would be greatly appreciated

Best Answer

If you really have to divide the population into 5 strata, you need to make those strata mutually exclusive. You can achieve that by assigning the visitors to one department only, and that department can be the one they visited the most. Let's look at the following fake data set where we are asked to separate the population of 4 visitors who visited 2 departments into 2 strata, while the departments are not non-overlapping in terms of their visitors.

visitor | department | visitedTime
----------------------------------
   1    |     a      |      5 
   1    |     b      |      4 
   2    |     a      |      0 
   2    |     b      |      2 
   3    |     a      |      9 
   3    |     b      |      1 
   4    |     a      |      0 
   4    |     b      |      5 

We see some visitors visited only one department (2 and 4), and the other two visited both departments, causing the stratification to fail. We can collapse this data set to the most visited department per visitor, and get

visitor | mostVisited
---------------------
   1    |     a      
   2    |     b      
   3    |     a      
   4    |     b      

Departments are non-overlapping in the second table. From and interpretation stand point, I think it makes sense to ask the visitor about the department they visit the most. Therefore this strategy is perfectly rationalizable.

Now, you may have a highly unbalanced picture such that one department is always/never the most visited. Think about the reception, everybody has to visit it once per visit, causing that the most visited department. If you can omit such a department, you should, to make your life easy. If you can't, you can keep this department outside of stratification, i.e. sample a certain amount of visitors from it first, and then apply the "collapsed stratification" I just discussed to the remaining departments.

Related Question