My study is on the relationship of college persistence of students, perception of students on student affairs services and their academic integration. I was suggested of using mediation analysis. May I know what is the best sample size for a mediation analysis? I currently have 2,338 population.
Solved – Sample size for mediation analysis
sample-size
Related Solutions
Some very basic sample size calculations are discussed here:
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm
http://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm
For a basic introduction, the book by Moore and McCabe - Introduction to the Practice of Statistics covers some of the basics in chapter 6.
Some deeper discussion is in
Russell V.Lenth, 2001.
"Some Practical Guidelines for Effective Sample Size Determination"
The American Statistician, August 2001, Vol. 55, No.3
Edit:
I just noticed that I had left out Jacob Cohen's book. I meant to mention it - Statistical Power Analysis for the Behavioural Sciences
I don't know this book, but I've seen some people recommend
Chow S, Shao J, Wang H. 2008.
Sample Size Calculations in Clinical Research.
2nd Ed. Chapman & Hall/CRC Biostatistics Series.
As I said in comments, I tend to use simulation.
The sample size formula depends on the test that you want to carry out. The one you mention looks like a Questionnaire/Survey study type. In this formula, there are assumptions on whether the studied population will lie or will respond incorrectly. This is expressed by the margin of error.
This article: Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies C.C Serdar, M.Cihan, D. Yucel, M.A. Serdar (2021) provides very helpful information about it.
"The margin of error expresses the amount of random sampling error in the survey results".
You will see in this article that even if the population is unknown, an approximation is enough since the sample size tends to increase less when the population size gets bigger.
The example that you gave (new disease) is not a questionnaire/survey type but a test for proportions (or means) which uses different sample size formula in which the population size is not involved, but power is.
in R, this can be done using the power.prop.test function (I assumed that you wanted to test proportions but similar function exists for t-test: power.t.test). It can also be achieved through the package pwr which offers the same possibilities.
The same R function is used to determine the sample size or the statistical power. The argument that you omit to pass as argument is the one that you are interested in. Here, we are looking for sample size, so we won't pass n as argument.
For instance, let's say that the standard protein level is 100 units and you want to detect a change (in means) of 5 units (delta
). You have estimated that the standard deviation is 10 (sd
). If you want to have 80% power in your test, then you will need a sample size of 64. The estimation of the standard deviation is based on your knowledge of the domain you are investigating, previous studies or research data.
power.t.test(delta=5, sd=10, sig.level=0.05, power=0.8, alternative="two.sided")
##
## Two-sample t test power calculation
##
## n = 63.76576
## delta = 5
## sd = 10
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE
The pwr
package provides the same information but instead of taking delta (difference in means: 5 in our example) and sd (estimated standard deviation: 10) as argument, it takes only the effect size (Cohen's d), which is, in this case the difference in means divided by the estimated standard deviation, i.e. 0.5.
pwr.t.test(d=.5, sig.level=0.05, power=0.8, alternative="two.sided")
Best Answer
Larger samples are almost always better, so the best sample size is the entire population. I guess what you mean to ask is "Is this sample large enough?" And I imagine that by mediation analysis, you mean the Baron and Kenny (1986) style of mediation analysis with three variables and a series of regressions. I also imagine that your variables, being things like "college persistence" and "academic integration", are self-reports on discrete scales with lots of variability. So, I expect that 2,338 subjects is quite enough, by which I mean, the results of such an analysis are unlikely to change if you enlarged the sample.
You will notice in the literature on social psychology that mediation analyses are routinely conducted with much smaller samples, on the order of 100 subjects.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. doi:10.1037/0022-3514.51.6.1173