Solved – What are some of the best approaches for variable selection in Poisson regression

feature selectionpoisson-regression

My target variable follows a Poisson distribution. I have to make a selection of best variables out of about 2000 variables. Is there any method exist for variable selection for poisson type distribution. So far I am familiar with variable reduction methods like lasso,IV,stepwise,PLS. What are some of the ideas, that I can try for poisson? Will it work good, if I convert the target to log(y) and treat it as linear model for variable selection?

Best Answer

You can use the lasso or elastic net regularisation. Both are available in glmnet, if you are an R user, with a poisson dependent variable, using the family=poisson option. Hopefully you have sufficient observations to be able to split the dataset and do cross validation.

Stepwise selection methods are generally best avoided, particularly with 2000 variables.

log(y) isn't a good idea if the data are poisson since you will be taking logs of zero. Of course you could use log(y+1) but since glmnet supports the poisson distribution this doesn't seem necessary unless there are computational limitations.

Related Question