Poisson Regression – Using Poisson Deviance to Evaluate Models with Different Loss Functions

boostinggradientloss-functionspoisson-regressionrandom forest

I am currently doing a a study on emergency department utilization rates at various geography levels. Especially of interest, are tree-based approaches to this analysis – namely random forest and GBMs.

I've built some models using Sklearn RandomForestRegressor and HistGradientBoostingRegressor, both of which I've run using Poisson loss. For evaluation metrics, mainly I have been concerned with proportion of deviance explained (D squared), and mean Poisson deviance.

However, to add some robustness I would also like to run some models that use MSE as loss and compare the performance to the Poisson-based models. I know this makes some false distributional assumptions – correct me if I am wrong – but given that the tree estimators are non-parametric, I don't believe this is entirely disingenuous.

I know that "raw" error metrics could be viable here (i.e., MAE, MSE, total error, etc.), but I want to maintain the Poisson-specific metrics if possible. So, could I still use Poisson deviance/deviance explained when MSE is the loss? Is there a statistical explanation for why or why not?

Best Answer

This is fine. You are just using a different estimator of your outcome (MSE-based instead of Poisson-based) that is evaluated on a particular loss function that happens to be related to the Poisson distribution.

It is routine to optimize loss functions other than the ultimate loss function of interest. For instance, ridge regression optimizes a function that somewhat differs from square loss, typically in an attempt to obtain a better out-of-sample square loss. In this answer I posted a few weeks ago, I give a situation where optimizing absolute loss results in better (out-of-sample) square loss, and the same idea applies to your situation.

Best Answer

Related Solutions

Tweedie and Poisson Loss Functions – Their Use in XGBoost and Deep Learning Models

Loss Functions – Are Loss Functions Only Used to Evaluate Estimators?

Related Question