Solved – How to deal with non random samples

populationsamplesampling

I have a specific question about random selection, representativeness and inference. It is well known that it's necessary to use random selection to get representative samples from the population of interest. But what happens with non-random samples?

I am working with an intentional sample. I compared the means of some of the main variables of my database with official data. And I can conclude that there aren't important differences between them. So I have a quite representative sample. However, I am confused about how to deal with this sample. When I estimate the means of some variables or the correlations between them, do I need to compute CI or p-values for this estimations?

I guess that it's not necessary, since I got the sample without random selection. Therefore, I have no sampling error, and I can't know how my estimations differ from the population. However, I have read some papers where the authors work with non-random samples and they make estimations (they use CI's and p-values). Moreover, it's difficult to use multivariate techniques (like ANOVA or regression) without the help of statistical significance.

Can anybody help me? I am very confused with this matter.

Best Answer

You certainly do have sampling error, even if you don't know what population your sample came from.

As with random samples, if all you want to do is make statements about the sample itself, then you do not need p values or any form of inferential statistics. Indeed, you don't need any specific sample size either. I can measure myself and my wife and say "I am taller than she is" (N of 2). I can just measure myself and say "I am 5 foot 8" (N = 1).

However, even with non-random samples you are usually interested in inference. You therefore have to assume either a) that the non-randomness in your sample isn't affecting things (a dangerous assumption!) or b) That there is some population from which your sample is random, and that that population is interesting.

In real life, this often gets blurred. In the many cases where there is no way to take a random sample (too expensive, too impractical, unethical, illegal, impossible) people often write as if they are inferring to something sort of in between a and b.

Related Question