Experiment Design – Full and Fractional Factorial Designs Explained

I have a data set consisting of six independent environmental variables (all binomial: present / absent) and one dependent variable (binomial: disease present / absent).
In order to determine the combination of factors that have the highest probability of leading to disease, I first need to conduct an Expert Opinion poll where I will have several experts rank all possible combinations of variables according to their probability of leading to the occurrence of disease. Then, I will obtain regression parameters for each variable using a conjoint analysis approach where each expert conforms a level (hierarchical design), the six environmental variables are independent variables, and the rankings are the dependent variable. There being six factor variables, there exist a total of 64 possible orthogonal combinations.
I reduced this overwhelming number of possible combinations (while retaining orthogonality) using the AlgDesign package of R. Here is the code followed only by relevant pieces of output:

levels.design = c(2,2,2,2,2,2)
full.design <- gen.factorial(levels.design)

   X1 X2 X3 X4 X5 X6
1  -1 -1 -1 -1 -1 -1
2   1 -1 -1 -1 -1 -1
3  -1  1 -1 -1 -1 -1
   .................
63 -1  1  1  1  1  1
64  1  1  1  1  1  1

set.seed(69)
fractional <- optFederov(~., data=full.design, approximate=FALSE, criterion="D")
fractional

The result is a subset of 12 combinations to be included in the conjoint analysis:

$design
    X1 X2 X3 X4 X5 X6
4   1  1 -1 -1 -1 -1
5  -1 -1  1 -1 -1 -1
    ................
57 -1 -1 -1  1  1  1

From what I understand, doing a regression analysis on all 64 combinations should lead to the same regression parameters as those obtained if I use only the reduced set (i.e. 12 combinations from the fractional factorial design).

Questions:

Do the code and the resulting output make sense?
Could anyone point me to a good and simple reference on how this fractional design works? I am afraid I might be doing things wrong by selecting a subset that produces different results from those obtained if a full factorial design was employed.

Best Answer

Speaking to question 2, you're doing a D-optimal design against the full model using a criterion for linear models. You have no guarantee that main effects are not correlated with other main effects or higher order terms. Though your design is likely a non-regular fractional factorial, you might be better served looking at the regular fraction $2^{6-3}_\mathrm{III}$ in 8 runs, or that design but augmented using the $D$-criterion to the 12 runs you want to do, or the regular fraction $2^{6-3}_\mathrm{III}$ in 16 runs. Check out Montgomery's "Design of Experiments" book or the Box, Hunter, and Hunter "Statistics for Experimenters" book. Montgomery's book has all the regular fractions in the back. You can also check out NIST's handbook page on fractional factorials

The times in which the fractional factorial design is guaranteed to fit the same parameter estimates as the full factorial are when only one of any pair of terms that are correlated or confounded in the fractional design has a non-zero effect size. It's very hard to get a $D$-optimal design to do this. $D$-optimal designs are mainly used (in industrial settings at least) when there's some kind of restriction on the number of runs you can do (so you can't do a power of 2) or on the design space.

Best Answer

Related Solutions

Related Question