Solved – How to simulate an unreplicated factorial design

experiment-designrrandom-generationregressionsimulation

I am dealing with a unreplicated factorial design. I have some illustrative examples but I need to simulate some unreplicated factorial designs. I do not how and what to use. Can $R$ handle this?

For example, I would like to analyse a $2^{4}$ factorial design (factors are A, B, C and D) with only one run and 15 contrasts. I have a single column for response. I would like to compare some methods in the literature to see which method detects active effects better. Thus, I set the active effects to have the same magnitude of $1.5\sigma$ and I would like to generate $100$ response vectors using errors that are i.i.d. with $\mathcal N(0 ,1)$. My true model has four active effects and I would like to simulate $100$ response vectors using this true model $y=3+1.5A+1.5B+1.5C+1.5BC$. But I do not know how to generate data like this using R.

An example from Montgomery for $2^4$ unreplicated factorial design

Thanks gung for your reply. I just wrote a simple code before I saw your answer here. I think, I need to build up a bit more R knowledge. Anyway, here it is:

For the analysis of unreplicated factorial designs with $k$ factors and $p=2^{k}-1$ factorial effects (the main effects and interactions), the following model is generally used

\begin{equation}
y=\sum\limits_{i=0}^{p}x_{i}\beta_{i}+\varepsilon_{i}
\end{equation}

So, Firstly I introduced my sign table for $2^{4}$ and $\beta$ coefficients of so-called active effects.

Sign table consists of rows (runs) and columns (contrasts with general mean).
enter image description here

And then, I created my regression equation with magnitudes of active effects and zeros of remaining inactive effects. My simulated model, for example, was $y=3+1.5A+1.5B+1.5C+1.5BC$.
enter image description here

And then, I run the code below

x=read.csv("sign2.txt", header=TRUE)
sign= as.matrix(x)
is.matrix(sign)

y=read.csv("beta2.txt", header=TRUE)
beta= as.matrix(y)
is.matrix(beta)

signt=t(sign)

bs=t(beta %*% signt)

epsilon=matrix( rnorm(16*1,mean=0,sd=1), 16, 1) 

response=bs+epsilon

However, unfortunately, it's for one simulation. I will put a loop command to run the simulation n-times.

Best Answer

The key to generating random data like yours in R is to use ?rnorm. You may also want to set the random seed to a fixed value, via ?set.seed, so that your simulation can be exactly replicated in the future. For convenience, you may prefer to use ?expand.grid to create your factor combinations, although you can do it manually as well (and could conceivably prefer that for the clarity it affords). Once you have set up your simulation, you will want to run it many times to get a sense of its long run behavior; this can be done using ?replicate, I believe, but I usually just nest it in a for loop because I specialize in writing comically inefficient code.

Here is an example:

> set.seed(1)
> A <- c(0,1)
> B <- c(0,1)
> C <- c(0,1)
> D <- c(0,1)
> Xmat <- expand.grid(A=A, B=B, C=C, D=D)
> Xmat
   A B C D
1  0 0 0 0
2  1 0 0 0
3  0 1 0 0
4  1 1 0 0
5  0 0 1 0
6  1 0 1 0
7  0 1 1 0
8  1 1 1 0
9  0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
> y <- 3 + 1.5*Xmat$A + 1.5*Xmat$B + 1.5*Xmat$C + 1.5*Xmat$B*Xmat$C + 
+      rnorm(n=16, mean=0, sd=1)
> y
 [1] 2.373546 4.683643 3.664371 7.595281 4.829508 5.179532 7.987429 9.738325
 [9] 3.575781 4.194612 6.011781 6.389843 3.878759 3.785300 8.624931 8.955066

Related Solutions

Solved – Simulating responses from a factorial experiment for power analysis

Here are a few ideas to get you started --

Simulation usually has two parts: we'll want a function to generate sample data, and then a function to analyze the results of our simulation.

This setup has a lot of flexibility -- you can (and should) modify the code to match the causal relationships you expect to find.

Here's an example of a function to generate data for a continuous outcome variable:

# a single draw of simulated data will have n observations
# we'll replicate this B times:
data.gen = function(n, B){

# before generating simulated data, make an empty matrix to 
# hold the p-values we're going to keep track of:
p.vals = matrix(rep(NA,B*3),ncol = 3)  

# we want to replicate the process B times:
for(i in 1:B){  

# for the setup I have, I'm assuming 4 evenly sized blocks
# this function ensures n is a multiple of 4:    
stopifnot(n%%4 == 0)  

# creating the sample data (independent vars) for a single draw:
block  = factor(rep(1:4, each = n/4))
fact.1 = rbinom(n,1,.5)
fact.2 = sample(0:3, n, replace = TRUE)
error  = rnorm(n,0,3)

# create a dataframe:
df = data.frame(block, fact.1,fact.2, error)

# code the block factors (there's probably a better way to do this)
df$block.1 = as.numeric(df$block == 1)
df$block.2 = as.numeric(df$block == 2)
df$block.3 = as.numeric(df$block == 3)
df$block.4 = as.numeric(df$block == 4)

# specify the true relationship between your dv and your regressors:
# note: my choices here were entirely arbitrary.. you will definitely
# want to change these:

# block variable coefficients:   
b.1 = 0.5
b.2 = -.5
b.3 = -1
b.4 = 0.5

# factor variable coefficients:
b.f1 = 3
b.f2 = 4

# interaction:
b.f1f2 = 2


# specifying the true relationship between your regressors and your DV:
df$y = with(df,block.1*b.1 + block.2*b.2 + block.3*b.3 + block.4*b.4 +
              b.f1*fact.1 + b.f2*fact.2 + b.f1f2*fact.1*fact.2 + error)


# fit a model:
lm.1 = lm(y ~ block + fact.1*fact.2, data = df)


# save the p-values from the regression in the matrix you created:
p.vals[i,] = as.vector(summary(lm.1)$coefficients[3:5,4])

}

# clean up the data a bit -- 
p.vals = data.frame(p.vals)
names(p.vals) = c("fact.1","fact.2","int")

# return the p-values:
return(p.vals)

}

Now, we're ready to run the simulation:

# running an experiment with n = 80, B = 1000 takes about 5 seconds:
sim.dat = data.gen(80,1000)

Finally, we can write whatever functions we want to check the power -- here, I just report the proportion of experiments that result in a p-value (for each factor, and the interaction term) less than 0.5:

sum(sim.dat$fact.1<.05)/length(sim.dat$fact.1)
sum(sim.dat$fact.2<.05)/length(sim.dat$fact.2)
sum(sim.dat$int<.05)/length(sim.dat$int)

The setup I've created doesn't capture the count data you're interested in.. if you'd like to build in that feature, start by modifying the df$y bit of the code. Also, you may want entirely different coefficients, or to test a different model entirely. Finally, rather than reporting the proportion of significant results, you may want to consider plotting the coefficients or p-values.

Hope this gets you started!

Logistic – Conducting Power Analysis for Factorial Logistic Regression Without Estimated Proportions

As I discuss in my answer to your linked question, there are different kinds of power when there are multiple hypotheses you want to test. For example, you can talk about the all effects power, the power to detect a specific effect, or the any effect power (these are in weakly descending order). If you only care about the one effect, and the other effects are nuisances, you can do what you suggest. (Technically, you should add a couple of additional data to account for the degrees of freedom that will be lost accounting for the nuisance parameters, but that seems inconsequential in your case with so much data anyway.)

On the other hand, if you care about all of these effects, and they are orthogonal (as suggested by "a balanced factorial designed experiment"), then you could do what you suggest for each effect. The all effects power would be the product of the powers for the three specified effects. For instance, let's say that at a given N, the prespecified single effect powers are .82, .80, and .67. Then the power to detect all three would be .82 * .80 * .67 = 0.44.