Solved – Which one to use Splines / Interaction or both

interactionlogisticregressionsplines

I am modeling a binary event of whether a sale will happen or not for an online retailer. I have millions of clicks to refer to. Needless to say the rate of sales happening is very very low (<1%). Now, to model this event, I am taking the product price to be one of the predictor variables that i am passing in the logistic model.

  1. Right now, I am using the linear piece wise models or linear splines. However, I read in a paper that Cubic splines are better if the distribution is highly non linear. What should be an objective way to gauge that cubic splines should be used or not?
  2. Currently I am creating knots based on the business sense and intuition. Is there a better way to know the number of knots and their placements?
  3. In one of the test runs, I found that interaction between price and sellers (again one of the predictor variables!) plays an important role. Do you think that it can also explain some non linearity?

Now, I know that interaction and splines both should help me with the non linearity in my independent variable. Which one should I choose or how to decide if I should go with a combination of both of them?

Best Answer

Without going into detail,

  • cubic splines have advantages over linear splines, namely more properly reflecting smooth underlying relationships and being less sensitive to knot placement
  • knots can be placed using subject matter knowledge or using the observed data density (e.g., put knots at fixed quantiles of a predictor)
  • getting nonlinear main effects right by modeling them flexibly is necessary before considering interactions
  • interactions between regression splines are called tensor splines and can give you a smooth interaction surface. This involves taking all products of terms making up the regression splines in the main effects
  • think about how the number of purchases guides the estimation of how many parameters (degrees of freedom) to spend on each predictor and on the interactions
  • the R rms package has a restricted interaction operator %ia% to create tensor splines with fewer terms. The restriction is that the interactions are not doubly nonlinear.
Related Question