Solved – Is logistic regression a valid way of analyzing A/B testing results

ab-testexperiment-designlogisticregression coefficientsstatistical significance

I'm very new to the idea of A/B testing and I want to see if my train of thought here makes sense.

Suppose that I run an experiment with two designs. I get two sets of resulting data, one for each design. My resulting data has a number of variables that indicate user behavior/interaction with the design, such as how many times a user used the product, ratings given by and of the user, etc. In order to test if the two resulting data set is statistically significantly different, I run a logistic regression with lasso regularization (to deal with the problem of multicollinearity between my variables) to predict which design was seen by each user. I'm making a couple decisions here:

1) Instead of defining one metric (a "conversion" rate), and running a proportions test or a t-test on it, I want to use the coefficients on the logistic regression to see if there are interesting differences between the two designs on each of my "features" (indicating an aspect of user interaction).

2) I'm assuming that the lasso is a good way to deal with correlated features in logistic regression, and that the resulting coefficient can still be interpreted per point 1).

3) If a logistic regression model can predict which design was seen by each user better than chance (+50% accuracy), I can conclude that the two designs have significantly different impact.

Are these three lines of reasoning correct? If not, what's a better way to approach this issue of not just testing one metric but looking at the resulting set of indicators for variations in user behavior? Also, is this the normal use of logistic regression in this setting or is it generally used for a different purpose?

Thanks in advance!

Best Answer

In A/B testing one user is generally shown only one design & you check the conversion against control to see if A) There was a lift if yes then B) which design is better.

If your variables are about the users (like age gender) & design aspects (like color & size etc) then by combining data for both designs with conversion (could be time spent or metric of your choice) you can see which aspects of the design are user dependent.

e.g., - The design A is found more successful because of majority of teenagers visit the website but all people above 35 have liked design B, this interaction will not be captured by model if you take conversion as dependent variable. But doing two individual models will be able to identity this.

It will be good, if you can tell us more about the variables.

Hope this helps.

Related Question