Linear Regression – How to Convert Dummy Variable Coefficient into Percentage Change?

multiple regressionregressionregression coefficients

Given a model predicting a continuous variable with a dummy feature, how can the coefficient for the dummy variable be converted into a % change?

Example:
shop_sales ~ has_self_checkout

where the coefficient for has_self_checkout=1 is 2.89 with p=0.01

Based on my research, it seems like this should be converted into a percentage using (exp(2.89)-1)*100 (example). However, this gives 1712%, which seems too large and doesn't make sense in my modeling use case. I'm guessing this calculation doesn't make sense because it might only be valid for continuous independent variables (?), but not sure if this is correct.

The mean value for the dependent variable in my data is about 8, so a coefficent of 2.89, seems to imply a ballpark 2.89/8 = 36% increase.

I also considered log transforming my dependent variable to get % change coefficents from the model output, but since I have many 0s in the dependent variable, this leads to losing a lot of meaningful observations.

Best Answer

The mean value for the dependent variable in my data is about 8, so a coefficent of 2.89, seems to imply roughly 2.89/8 = 36% increase.

This is the correct interpretation. The important part is the mean value: your dummy feature will yield an increase of 36% over the overall mean. If you have a different dummy with a coefficient of (say) 3, then your focal dummy will only yield a percentage increase of $\frac{2.89}{8+3}\approx 26\%$ in the presence of that other dummy.

An alternative would be to model your data using a log link. For instance, you could model sales (which after all are discrete) in a Poisson regression, where the conditional mean is usually modeled as the $\exp(X\beta)$ with your design matrix $X$ and parameters $\beta$. In this setting, you can use the $(\exp(\beta_i)-1)\times 100\%$ formula - and only in this setting. (Note that your zeros are not a problem for a Poisson regression.) And here, percentage effects of one dummy will not depend on other regressors, unless you explicitly model interactions.

Alternatively, you could look into a negative binomial regression, which uses the same kind of parameterization for the mean, so the same calculation could be done to obtain percentage changes.

Bottom line: I'd really recommend that you look into Poisson/negbin regression.

Related Question