Assuming you are going to average the first 12 months to form a baseline measure and the second 12 months to form as a follow-up measure, your problem reduces to a repeated measures t-test.
G*Power
You might want to check out the following menu in G*Power 3:
Tests - Means - Two Dependent Groups (matched pairs)
.
Use A priori, $\alpha=.05$, Power = 0.90.
Use the Determine
button to determine effect size. This requires that you can estimate time 1 and 2 means, sds, and correlation between time points.
If you know nothing about the domain, based on my experience in psychology, I'd start with something like
M1 = 0, SD1 = 1, SD2 = 1
correlation = .60
This means that M2 is basically a between subjects cohen's d.
You could then examine a few different values of M2 such as 0.2, 0.3, ... 0.5, ... 0.8, etc. Cohen's rules of thumb suggest 0.2 is small, 0.5 is medium, and 0.8 is large.
R
UCLA has a tutorial on doing a power analysis on a repeated measures t-test using R.
Side point
As a side point, you might want to consider having a control group.
If I had to do this, I would use a simulation approach. This would involve making assumptions about the regression coefficients, predictor distributions, correlation between predictors, and error variance (with help from the researcher), generating data sets using the assumed model, and seeing what proportion of these give a significant p-value for the interaction. Then use trial and error to find the minimum sample size giving the required power.
Best Answer
I agree with @AndyW that a key consideration in a power and sample size determination is to decide how big a difference between two means would be a result of practical importance.
A more difficult ingredient in such a determination is to estimate the variances of the responses. Maybe you can get a clue about the variances by looking at prior studies using similar methods.
You also need to know the significance level of the test you will use: often 5%, sometimes in medical studies 1% or smaller. And you need to have a good idea what power the test needs to have. Often people want at least 80% or 90% probability of detecting an effect of the desired size, if real.
Do not be dismayed that some of the necessary inputs into a power and sample size determination are inevitably guesses. Such a determination based on reasonable guesses is almost always much better than none at all.
Suppose you want 85% power for a two-sample t test at the 5% level to detect a difference in means that is 5 units, when the standard deviation of the observations may be 10 units. Many statistical software programs have 'power and sample size' procedures for balanced studies (equal sample sizes in the two groups).
Below is printout from a recent release of Minitab that includes the situation I described in the previous paragraph.
Power and sample size procedures are available in many statistical software programs for many of the most common procedures.
Some procedures require simulation. A two-sample t test would require simulation if you know ahead of time that one of the two groups has a larger variance so that you'll need to do a Welch t test, you may need simulation for that. Also, most software assumes equal sample sizes in the two groups. If financial constraints require one group to be smaller than the other, then you'd probably need simulation.
The simulation in R below addresses the situation where one of the two groups in a a two-sample t test has SD $9$ and the other has SD $11.$ With $70$ observations in each group the power is about 83% even using a Welch test to accommodate to slightly different group standard deviations.