Solved – Promotion analysis with regression, negative coefficients

econometricsinterpretationregressionregression coefficients

I used multiple linear regression to model promotion effects on sales on sample retail store, but some coefficients becomes negative. As a business interpretation, should I consider these promotions as inefficient, or does it mean I can't used any results from that model, even the positive coefficients? Example results follow:

                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)         8.46361    0.59232  14.289  < 2e-16 ***
REL_PRICE_DISCOUNT -0.26608    0.08828  -3.014  0.00303 ** 
FEATURE             5.24377    1.64297   3.192  0.00173 ** 
DISPLAY             2.98531    2.30882   1.293  0.19801    
TPR_ONLY           -2.17929    2.33328  -0.934  0.35181  

In that example, should I use coefficient of FEATURE and DISPLAY to report that they are effective promotions, and discard TPR_ONLY and REL_PRICE_DISCOUNT, or I shouldn't use the model and completely discard it?

Best Answer

It's been some time since you posted this question, so I hope my response is still relevant.

I'm an analyst at a large retailer who specializes in promotion modeling, and we do much work in this area using regression techniques to estimate the effects of promotions among various other elements driving sales performance.

I'd like to answer your question, but I need more details regarding your model.

How is the data in your table organized? Does it include historic data for each product, or just a snap shot of the SKU selection at a given time.

Based on what information you've presented, here are my thoughts:

It may be that the variable REL_PRICE_DISCOUNT is identifying that products on discount tend to possess lower sales (think expensive brands with lower sales volumes compared to big volume inexpensive products, if the expensive brands are on sale but typically have smaller volumes, the coefficient will capture this).

If you model a particular SKU's sales history and price (incl. discounts) you may see a different story (one that captures the response to the promotion).

Other than that, TPR_ONLY has a p-value greater than 0.05 and should be removed from the model (if you go by the p-value criterion), as you suggested.

An important thing to think about here is how your model is interpreting the data. If no variable has been added to account for the differences in product sales (i.e. market share) of each brand/SKU, and your model is based on snapshot data (no history), then it is likely your model is just identifying that products on promotion have lower sales compared to all other products, it does not necessarily mean that the promotion is causing sales to be lower.

One option here is to include a variable that takes into account the natural difference in sales between SKUs (maybe market share %)? You'll have to explore options like these and see how it affects your model output. You always want to make sure that if you're adding variables, they aren't correlated with the other variables in your model.

Don't give up and good luck!

Related Question