Solved – limit regression prediction to positive (Sklearn)

lassopythonrandom forestregressionridge regression

I have a dependent continuous variable with range 0-100 representing restaurant health violations. Due to the nature of the variable, it does not make sense for a regression equation to predict a restaurant to score negative violations. I would like to limit the prediction interval for many different regression algorithms that I am running in scikit-learn (OLS, Lasso, Ridge, Random Forest).

Other responses to this problem (example) state that "If your DV is never negative then you can take the log. Then the predicted values on the raw score would never be negative."

I used numpy to take the log of my DV and my predictions are still returning negative (I don't know why they would be different). How can I address this issue, specifically with implementation in python?

Best Answer

Honestly, I do not think taking a log will always be a good idea even it can give you positive responses because it will stress more on small violations than higher violations --- small violations will have relatively higher weights in the loss in log scale than in normal scales. If this is not what you want, you probably should not use it.

And I think a simple idea is to just use the normal model and training. When it gives negative responses, set them to 0. Tune the hyperparameter based on this. I think this can give you a reasonable model.