I am interested in multivariate investigations. I have been trying to learn about designing such experiments where there is one dependent variable (a class/group) and many independent variables that ideally would help discriminate between the groups. However, the 'basic' issue I am running into is the problem of sample size calculation. I have been pouring through literature to see if there is a consensus how to calculate sample size for a multivariate investigation. Does anyone here know of a method to accomplish this task? I am primarily familiar with the R program but I am also competent with SAS and the Python programming language. Is anyone familiar with this task?
Solved – Sample size calculation for multivariate problems
multivariate analysisrsample-size
Related Solutions
It's hard to ignore the wealth of statistical packages available in R/CRAN. That said, I spend a lot of time in Python land and would never dissuade anyone from having as much fun as I do. :) Here are some libraries/links you might find useful for statistical work.
NumPy/Scipy You probably know about these already. But let me point out the Cookbook where you can read about many statistical facilities already available and the Example List which is a great reference for functions (including data manipulation and other operations). Another handy reference is John Cook's Distributions in Scipy.
pandas This is a really nice library for working with statistical data -- tabular data, time series, panel data. Includes many builtin functions for data summaries, grouping/aggregation, pivoting. Also has a statistics/econometrics library.
larry Labeled array that plays nice with NumPy. Provides statistical functions not present in NumPy and good for data manipulation.
python-statlib A fairly recent effort which combined a number of scattered statistics libraries. Useful for basic and descriptive statistics if you're not using NumPy or pandas.
statsmodels Statistical modeling: Linear models, GLMs, among others.
scikits Statistical and scientific computing packages -- notably smoothing, optimization and machine learning.
PyMC For your Bayesian/MCMC/hierarchical modeling needs. Highly recommended.
PyMix Mixture models.
Biopython Useful for loading your biological data into python, and provides some rudimentary statistical/ machine learning tools for analysis.
If speed becomes a problem, consider Theano -- used with good success by the deep learning people.
There's plenty of other stuff out there, but this is what I find the most useful along the lines you mentioned.
Commonly, the different values that a factor can attain in an experiment are called "levels". So let's say there are $k$ factors, and factor $j$ has $n_j$ levels.
There are $n_{f1}\cdot n_{f2}\cdot \dots \cdot n_{fk}$ possible factor combinations, i.e. possible versions of web pages that could be viewed. To answer the question whether any one of these versions is better than any other one, each has to be viewed a certain number of times, let's say $N$ for simplicity, for a sample size of $2N$. (You assumed $N = 100$). So the total sample size required (the total number of pairs of eyes that you'll need for all versions) is $$ N \cdot n_{f1}\cdot n_{f2}\cdot \dots \cdot n_{fk} $$ which can become pretty large, although it's generally smaller than your formula.
The size of $N$ in turn depends on the separation of the purchase probabilities that you want to distinguish. If all purchase probabilities are close to each other, then $N$ would have to be quite large to pick the larger probability reliably even in a simple pairwise comparison. Examples: If you use $N = 100$ and one particular page design has purchase probability $p = .5$ and you are using a test with significance level 0.95, then you'll have a better than even chance of correctly identifying another design as better only if that design has purchase probability at least $p = .62$ or so. If that other design has $p = .55$, you won't be able to tell with $N = 100$ ... although it means 10% more revenue. Paradoxically you would be forced to work with an even larger sample size if the differences in probabilities are smaller.
In practice, one would not use all possible level combinations for all factors, because experience shows that interactions between multiple factors rarely matter. For example if you have four factors (say number of headings, number of images, number of columns, background color), then it is likely that once two have been set (say number of headings and number of images), the other two factors don't matter that much any more. This can be used to reduce the total number of level combinations. Google "fractional factorial design".
Best Answer
Sample size calculation is a part of power analysis. There are two general ways to approach a power analysis: You can rely on some canned program or you can simulate.
To rely on a canned program, you will need to know what test you are using (e.g. linear regression, logistic regression ....), the effect size that you want to be able to detect, and your tolerances for type I and type II errors.
Then, all three of the programs you mention have programs to do power analysis. SAS, for example, has PROC POWER and PROC GLMPOWER.