Solved – How to design experiments for Market Research (with a twist)

Consider a kind of auction where you are presented with, say, 1000 prospective clients. Based on information about these prospectives–age, sex, race, income, educational achievement and the like–you may 'bid' to pitch your product to some fraction of them, say 250. (Ignore the cost of bidding.) To maximize your chances of selecting the right subset, I would probably use a model of our product's 'likability' built using logistic regression. I can fumble my way through this part well enough.

However, to build the model of product likability, I have to do some market research, testing the product pitch on subjects that we may recruit from the population at large. This is actually fairly costly. Moreover, it should probably be tuned to the demographics of the population from which the prospectives are drawn. For example, a factorial design, say, might ask us to recruit test subjects in equal proportion across the levels of the race factor, when in reality we are likely to encounter very few Native Americans, say, in the 1000 prospectives, and can simply choose not to pitch to them at all as a rule. (Sad, but true.)

How should such an experiment be designed? To be concrete, the design variables are all categorical and ordinal factors, the bidding fraction is an input parameter (1/4 in the example quoted above), as is the maximum number of subjects that can be recruited. It seems like maybe some mix of experimental design and random sampling might be appropriate, but I am open to all reasonable suggestions and pointers.

I should also note that given the likely small effect sizes and the small sample recruitment pools that we can afford, it is unlikely that the market research will yield statistically significant regression coefficients. And so overoptimizing the experimental design is probably silly, and any reasonably non-insane procedure will suffice.

Best Answer

One approach to your problem is using a stratified sample. One purpose of stratification is making sure certain domains (groups) of the population are represented in the sample, which otherwise would be represented too sparsely for valid inference, e.g. due to small selection probability.

For example, if "Native Americans" is an important group in terms of your estimates from the 'likeability model', but their selection probability is very small, a simple random sample (SRS) of size $n=50$ might contain no or only very few units of this type. If you then include Nat. Am. as an indicator variable in the model, the estimates will perhaps be extremely unreliable (large standard errors), or the parameters cannot be estimated at all. The goal of a stratified sample is to avoid this.

Stratification means selecting units with a higher probability than they would have in a SRS. In estimating your logistic/polynomial regression, you will be able to use stratification weights (design weights) to adjust for the higher selection probability. A weight is then commonly defined as $$w_i=\frac{\pi_s}{\pi_{pop}},$$ where $\pi_s$ is the selection probability into the stratified sample, and $\pi_{pop}$ is the selection probability when using a SRS.

The problem in your particular application is that you probably cannot stratify for all characteristics you mention, given the small sample size (say $n=50$). In stratification, you usually need to cross all characteristics and sample from all cells of the resulting contingency table. The number of cells quickly grows with the number of characteristics and categories of each characteristic, and at one point of complexity, it is not possible anymore to fill all cells sufficiently given a fixed $n=50$.

My advice therefore is to look at your characteristics and make a selection as follows. First, make a list of all characteristics that you want to have in the final model, because you assume that they will have predictive power for 'likeability' or they identify groups that are important in the 'bidding process'. Second, from these characteristics, distinguish between those that imply a high and low selection probability during sampling. A low selection probability is one that will probably give you too few observations in one of the categories given a SRS sample of size $n$.

For example, 'gender' usually will be a well-represented variable with 50/50 probability in the pop., so even if $n=50$ you will have 'sufficient' men and women, but Nat. Am. might not be a variable of this type, but still important for your model. A power analysis might provide further guidance if needed, but it depends on the particular model and might be very complex for polytomous regression.

The characteristics with too low selection probability are the candidates for stratification, whereas the variables with high enough / balanced selection probability across their categories can be ignored in sampling design. Now that you have identified the crucial strata for your population and model, you can build the sampling design strategy on them (i.e. randomly sample from all relevant strata to fill all 'cells').

I hope that when doing this you will end up with few enough strata to go ahead with a sample of size $n=50$.

Best Answer

Related Solutions

Related Question