Solved – Sample size for proportions in repeated measures

proportion;repeated measuressample-size

I'm trying to help a scientist design a study for the occurrence of salmonella microbes. He would like to compare an experimental antimicrobial formulation against a chlorine (bleach) at poultry farms. Because background rates of salmonella differ over time, he plans to measure % poultry w/salmonella before treatment, and after treatment. So the measurement will be the difference of before/after % salmonella for the experimental vs. chlorine formulas.

Can anyone advise on how to estimate the sample sizes necessary? Let's say the background rate is 50%; after bleach it's 20%; and we want to detect whether the experimental formulation changes the rate by +/- 10%. thank you

EDIT:
What I'm struggling with is how to incorporate the background rates. Let's call them p3 and p4, the "before" salmonella rates for bleach and experimental samples, respectively. So the statistic to be estimated is the difference of differences: Experimental(After-Before) – Bleach(After-Before) = (p0-p2) – (p3-p1). To fully account for the sampling variation of "before" rates p2 and p3 in the sample-size calculation — is it as simple as using p0(1-p0)+p1(1-p1)+p2(1-p2)+p3(1-p3) wherever there's a variation term in the sample-size equation? Let all samples sizes be equal, n1 = n2 = n.

Best Answer

Let's take a stab at a first-order approximation assuming simple random sampling and a constant proportion of infection for any treatment. Assume the sample size is large enough that a normal approximation can be used in a hypothesis test on proportions so we can calculate a z statistic like so

$z = \frac{p_t - p_0}{\sqrt{p_0(1-p_0)(\frac{1}{n_1}+\frac{1}{n_2})}}$

This is the sample statistic for a two-sample test, new formula vs. bleach, since we expect the effect of bleach to be random as well as the effect of the new formula.

Then let $n = n_1 = n_2$, since balanced experiments have the greatest power, and use your specifications that $|p_t - p_0| \geq 0.1$, $p_0 = 0.2$. To attain a test statistic $|z| \geq 2$ (Type I error of about 5%), this works out to $n \approx 128$. This is a reasonable sample size for the normal approximation to work, but it's definitely a lower bound.

I'd recommend doing a similar calculation based on the desired power for the test to control Type II error, since an underpowered design has a high probability of missing an actual effect.

Once you've done all this basic spadework, start looking at the stuff whuber addresses. In particular, it's not clear from your problem statement whether the samples of poultry measured are different groups of subjects, or the same groups of subjects. If they're the same, you're into paired t test or repeated measures territory, and you need someone smarter than me to help out!

Related Solutions

Solved – Sample size for a very skewed A/B Test

If one wants to perform an A/B-Test with a small baserate not just for funsies, one has to ask what effect size i.e. which absolute improvement is considered to be worth the effort.

For example, if p=1/10^6 and number-of-visitors-per-month=10^6, then even an relative improvement of 500 % means an absolute improvement of 4 more conversions on average. If such differences cannot be justified with monetary arguments (e.g. the website is selling trips to space), an A/B-Test is not worth the trouble.

However, if such differences are considered to be worth the effort, I suggest to break down / decompose the conversion-rate into the participating factors. For example, let's say that one measures conversions as $\frac{boughtSpaceTrips}{siteVisitors}$. This rate can be splitted into ...

$\frac{boughtSpaceTrips}{siteVisitors} = \frac{boughtSpaceTrips}{spaceTripsInBasket} * \frac{spaceTripsInBasket}{siteVisitors}$

This decomposition may allow one to detect differences in a decomposed ratio, which do not appear in the composed ratio because they are countered by the other ratios (negative correlation) or have such a small contribution weight, that it requires the mentioned ton of data to do so. Whether there is some sort of negative correlation between the decomposed factors can be decided by applying domain knowledge, for example, how much does it "cost" for the user to perform a certain action.

In the given constructed example, the reasoning

Improve $\frac{boughtSpaceTrips}{spaceTripsInBasket}$ => Improve $\frac{boughtSpaceTrips}{siteVisitors}$

is valid, but the other way around

Improve $\frac{spaceTripsInBasket}{siteVisitors}$ => Improve $\frac{boughtSpaceTrips}{siteVisitors}$

is not.

If the decomposition does not lead to more feasible base rates, then take a look at the statistical discipline for this kind of problem (keyword: "rare event(s)"). But in this case you go beyond the scope of normal A/B-Tests, so I would ask again, whether this is worth the effort. Aside, my intuition tells me that one cannot avoid the pillars of the universe, so rare events still require a lot of data (but maybe not a ton), no matter which fancy method is applied (domain knowledge may help a lot though).

Solved – Repeated measures over time with small $n$

I have re-think your problem and found Friedman's test which is a non-parametric version of a one way ANOVA with repeated measures.

I hope you have some basic skills with R.

# Creating a source data.frame
my.data<-data.frame(value=c(2,7,7,3,6,3,2,4,4,3,14,167,200,45,132,NA,
245,199,177,134,298,111,75,43,23,98,87,NA,300,NA,118,202,156,23,34,98,
112,NA,200,NA,156,54,18,NA),
post.no=rep(c("baseline","post1","post2","post3"), each=11),
ID=rep(c(1:11), times=4))

# you must install this library
library(pgirmess)

Perform test Friedman's test...

friedman.test(my.data$value,my.data$post.no,my.data$ID)

    Friedman rank sum test

data:  my.data$value, my.data$post.no and my.data$ID
Friedman chi-squared = 14.6, df = 3, p-value = 0.002192

and then find between which groups the difference exist by non-parametric post-hoc test. Here you have all possible comparisons.

friedmanmc(my.data$value,my.data$post.no,my.data$ID)
Multiple comparisons between groups after Friedman test 
p.value: 0.05 
Comparisons
               obs.dif critical.dif difference
baseline-post1      25     15.97544       TRUE
baseline-post2      21     15.97544       TRUE
baseline-post3      20     15.97544       TRUE
post1-post2          4     15.97544      FALSE
post1-post3          5     15.97544      FALSE
post2-post3          1     15.97544      FALSE

As you can see only baseline (first time point) is statistically different from others.

I hope this will help you.

Best Answer

Related Solutions

Solved – Sample size for a very skewed A/B Test

Solved – Repeated measures over time with small $n$

Related Question