I am thinking about a factorial experiment with two factors. Both factors are ordered factors. Factor 1 has two levels: small and large. Factor 2 has four levels: never, sometimes, frequently, and often. I also want to conduct the experiment in a number of locations, so I will include location as a sort of "block." I expect larger responses for increasing levels of both factors, and I expect an interaction effect, too. Thus, I have a model as follows: Response ~ Block + Factor1*Factor2 + error
, which will have at least 40 observations, maybe 80, maybe 120, or so on until I can detect an effect.
I'll be measuring a number of response variables, most of which will be counts or 0 truncated (latency of response). I'm wondering how to simulate responses from my model with the expectation of a moderate effect size. I want to know what sample size is appropriate to detect a moderate effect from my treatments, but I'm not familiar enough with simulation to know where to start with such a problem. Any advice or direction or requests for more information would be much appreciated.
Additional information: I'm using R to do everything.
EDIT:
I implemented Mark T Patterson's answer to my question modifying it to fit my particular experimental setup and attempt to simulate poisson data, but I get warnings when I run the function. Fortunately, there are some relevant answers on CrossValidate: Generate data samples from Poisson regression. I'll keep learning how to simulate other data to match the other kinds of response variables I'll be measuring.
Best Answer
Here are a few ideas to get you started --
Simulation usually has two parts: we'll want a function to generate sample data, and then a function to analyze the results of our simulation.
This setup has a lot of flexibility -- you can (and should) modify the code to match the causal relationships you expect to find.
Here's an example of a function to generate data for a continuous outcome variable:
Now, we're ready to run the simulation:
Finally, we can write whatever functions we want to check the power -- here, I just report the proportion of experiments that result in a p-value (for each factor, and the interaction term) less than 0.5:
The setup I've created doesn't capture the count data you're interested in.. if you'd like to build in that feature, start by modifying the df$y bit of the code. Also, you may want entirely different coefficients, or to test a different model entirely. Finally, rather than reporting the proportion of significant results, you may want to consider plotting the coefficients or p-values.
Hope this gets you started!