My target variable follows a Poisson distribution. I have to make a selection of best variables out of about 2000 variables. Is there any method exist for variable selection for poisson type distribution. So far I am familiar with variable reduction methods like lasso,IV,stepwise,PLS. What are some of the ideas, that I can try for poisson? Will it work good, if I convert the target to log(y) and treat it as linear model for variable selection?
Solved – What are some of the best approaches for variable selection in Poisson regression
feature selectionpoisson-regression
Best Answer
You can use the lasso or elastic net regularisation. Both are available in
glmnet
, if you are an R user, with a poisson dependent variable, using thefamily=poisson
option. Hopefully you have sufficient observations to be able to split the dataset and do cross validation.Stepwise selection methods are generally best avoided, particularly with 2000 variables.
log(y)
isn't a good idea if the data are poisson since you will be taking logs of zero. Of course you could uselog(y+1)
but sinceglmnet
supports the poisson distribution this doesn't seem necessary unless there are computational limitations.