Solved – Sample size calculation for truncated normal distribution

epidemiologyrsample-size

So, I am new here.

I need to perform a sample size calculation for a clinical trial. The study sample will be select according to criteria of person's height. Persons within the particular height range (female, 1.6m to 1.7m) will be invited to participate in trial. We know the expected sample standard deviation from previous trial. But my concern is that sample is not from a normal distribution. Usual power/sample size calculation need the assumption of normal distribution of test statistic under $H_0$ and $H_1$, but here I believe we have truncated normal distribution. So how may I modify power.t.test, or make some other calculation in R, to accommodate this? My colleague says to just rely on central limit theory and assume normal with 1.65 mean and known standard deviation, but I believe this is wrong due to the truncation. Any advice would be appreciated.

Best Answer

Your colleague is correct.

In the US, 1.6 to 1.7 m is near the middle of the range of adult female heights. According to Wolfram Alpha, which summarizes NHANES 2006 data, the height distribution in this range should look close to this:

Female heights between 1.6 and 1.7 m

This is extremely close to uniform: its mean is 1.649 m and its standard deviation is 0.0287 m (whereas a uniform distribution in this range would have a mean of 1.650 m and SD of 0.0289 m). Its skewness coefficient is only 0.054.

Accordingly, independent samples drawn from this distribution will have means that are close to normally distributed. Here, for instance, is a histogram of means of 10,000 samples of just four heights drawn (independently) from this distribution:

Histogram of means

It is only very, very slightly non-normal (a Kolmogorov-Smirnov test rejects normality at p=0.94%, which is amazingly large given there are 10,000 data points). For the purpose of planning comparisons of mean heights among random groups of women, the normal approximation will work well. Standard power calculations ought to give good guidance.

Related Question