Solved – Linear regression sample size advice

clinical-trialsregressionsample-sizestatistical-power

I'm involved in sample size calculation for an oncology clinical trial. Our Primary outcome is quality of life (quantitative, normal).
There are two treatment regimes. Therefore I started with a simple t-test sample size calculation. With an effect size of 0.55, pow=.8, alpha=.05 we need overall 106 participants. Considering lost to fup and death before QOL measurement accrual size rise to 152 participants.

Trial is multicentre, with randomization stratified by centre. Patients can have different type of tumor (nsclc, cholangiocarcinoma, stomach, pancreatic)

The funding institution referees requested to (freely translating) “increment sample size to account heterogeneity of patients involved". They did not request stratified randomisation center*tumor (maybe due to likely imbalance for a small sample size), eventually suggested a minimization.

I'm starting to think about a linear model like the following, to fullfill requests

QOL = f(Treatment.dummy, TumorType.dummies)

Typically effect size relies on the R^2 with and without the covariate of interest. However given the state of the literature, i don't know how to make an educated guess about this stuff.

The only idea that remains (at least to me) is to simulate with Monte Carlo. In the end it would be an F test of model with vs without Treatment.dummy.

Given a certain n (growing beetween simulation steps):

  • I would simulate proportion of different tumor patients recruited, given infos from our clinical database.
  • Given the tumor type I would simulate proportion of patients randomized to Treatment in that strata (…thinking about Uniform with mean .5 … eg U(0.35,0.65)
  • For pts in control group, given tumor type, i would simulate QOL given normative data of the instrument
  • For pts in treatment group, given tumor type, i would simulate QOL given normative data of the instrument AND effect-size of interest

Then do the regression model, the test and get the power.

Another way would be to start from the t-test sample, and then increment it given models sample size rule of thumbs (eg 20 pts per added variable); but I don't like it very much because you loose grip with power-analysis.

Any other approach? Any suggestion (even “yes, do the Monte Carlo simulation”) would be very appreciated.

Best Answer

I am not sure how you would even simulate data if you don't know what parameters to put in (and, as you said, that involves $R^2$ with and without covariates; you might not explicitly enter those into a simulation, but they'd be there in the raw data.

If the literature doesn't have good estimates for your particular area, does it have them for any related areas? Some other form of cancer, perhaps? I'd be surprised if there was nothing usable - cancer (as you doubtless know) has been researched a lot! But if you can't find anything, you have to guess and then you have to be able to defend your guess.

Once you make a guess, you could either simulate the data or use standard power calculations. The former gives you a lot more control but is more complex and takes longer. The latter is easy but makes assumptions (sometimes hidden ones) in the calculation.

Related Question