multiple-regression – How to Calculate Necessary Treatment Group Size for Power in a Regression Setting

multiple regressionstatistical-power

I have an observational quasi-experimental study, where I try to estimate the effect of a "treatment" (participation in a programme) on a continuous outcome.

Participants (some two-thirds of all participants) are matched with non-participants on a few background characteristics. To estimate the effects (difference-in-difference), I use a multiple regression. With this method I get an estimated effect size of approximately 0,10, not significant (standard error 0.09). In the original data set I had more than 6000 treated and around 200,000 untreated in a "comparision group".

It was suggested that the actual "treatment" was too small (it ranges from 1 to 9 visits to a counselling provider) and it was suggested that the threshold for treatment should be moved up to "more than one visit".

This suggestion limits the number of treated quite badly (The number of matched controls also decreases). Defining treated as observations with more than one visit reduces the number of observations to 740. (Because of a skewed distribution in number of visits, and necessary qualifications in what constitutes a usable "treatment spell"). I am quite worried about the power of the renewed estimate. I would like to reject the suggestion, citing a further reduction in power because of the small effect size and the reduction in the sample.
But how would I calculate how many observations I need to "keep the power" in this regression setting (as rebuttal)? Just calculating difference in group means does not control for secular drift or other confounders.

Please, any help appreciated. I hope I have explained my problem adequately
$$
\ln Y_t = \alpha +\beta(Treated*After) + \gamma_1*Treated + \gamma_2*After +\gamma_3X
$$
where $\beta$ is effect size and
Treated=1 if treated (0 otherwise)
After = 1 if observation period is after treatment (0 if before treatment)
X a set of other explanatory variables

PS just redoing the estimations gives a new effect size of 0,064 , stderr 0,071 . More treatment, smaller average effect. well fancy that!

Best Answer

You could calculate the minimum detectable effect (MDE) for the average treatment effect (ATE) under the assumption that your outcome $Y$ is normally distributed. Then $$\text{MDE} = \sqrt{\frac{\widehat{\text{Var}(Y)}}{n}}\sqrt{\frac{1}{p(1-p)}}\left( q_{1-\frac{\alpha}{2}}+q_\lambda\right)$$ where $\widehat{\text{Var}(Y)}$ is the estimated variance of the outcome, $n$ is the sample size, $p$ is the fraction of program participants, $q_{1-\frac{\alpha}{2}}$ and $q_{\lambda}$ are the $1-\frac{\alpha}{2}^{th}$ and $\lambda^{th}$ quantiles of the standard normal distribution; $\alpha$ is the level and $\lambda$ is the desired power which are chosen by you. Typical choices are $\alpha = 0.05$ and $\lambda = 0.2$.

In terms of participation (number of treated relative to untreated), the MDE is the smallest when $p=0.5$, i.e. when you have the same number of treated and untreated individuals. The MDE also decreases when you increase the sample size $n$.

All of this is easily done in your case. The term $\sqrt{\frac{\widehat{\text{Var}(Y)}}{n}}$ is the standard error of your treatment parameter, $p$ you can calculate from your data, and the above quantiles of the standard normal distribution are $q_{1-\frac{\alpha}{2}} = 1.96$ and $q_{\lambda} = 0.85$ for $\alpha = 0.05$ and $\lambda = 0.2$. Then if you find that your treatment effect is smaller than the MDE, $\beta < \text{MDE}$ you are under-powered. The only solutions in this case are to increase the sample size, increase the number of treated to match the number of untreated, or accept a higher level/lower power.

Related Solutions

Solved – Determination of effect size for a repeated measures ANOVA power analysis

Assuming you are going to average the first 12 months to form a baseline measure and the second 12 months to form as a follow-up measure, your problem reduces to a repeated measures t-test.

G*Power

You might want to check out the following menu in G*Power 3: Tests - Means - Two Dependent Groups (matched pairs). Use A priori, $\alpha=.05$, Power = 0.90. Use the Determine button to determine effect size. This requires that you can estimate time 1 and 2 means, sds, and correlation between time points.

If you know nothing about the domain, based on my experience in psychology, I'd start with something like

M1 = 0, SD1 = 1, SD2 = 1
correlation = .60

This means that M2 is basically a between subjects cohen's d.

You could then examine a few different values of M2 such as 0.2, 0.3, ... 0.5, ... 0.8, etc. Cohen's rules of thumb suggest 0.2 is small, 0.5 is medium, and 0.8 is large.

UCLA has a tutorial on doing a power analysis on a repeated measures t-test using R.

Side point

As a side point, you might want to consider having a control group.

Solved – Linear regression sample size advice

I am not sure how you would even simulate data if you don't know what parameters to put in (and, as you said, that involves $R^2$ with and without covariates; you might not explicitly enter those into a simulation, but they'd be there in the raw data.

If the literature doesn't have good estimates for your particular area, does it have them for any related areas? Some other form of cancer, perhaps? I'd be surprised if there was nothing usable - cancer (as you doubtless know) has been researched a lot! But if you can't find anything, you have to guess and then you have to be able to defend your guess.

Once you make a guess, you could either simulate the data or use standard power calculations. The former gives you a lot more control but is more complex and takes longer. The latter is easy but makes assumptions (sometimes hidden ones) in the calculation.

Best Answer

Related Solutions

Solved – Determination of effect size for a repeated measures ANOVA power analysis

Solved – Linear regression sample size advice

Related Question