Solved – feature importance using neural network

classificationfeature selectionneural networkspython

Is it a good practice to find the absolute value of the sum of weights of the features on the first hidden layer in neural network to find the importance of features using neural network ?

Best Answer

Say that all of our features have value 1. Give features one and two weights 3 and 1, respectively--they lead to node A where they activate with 1*3+1*1=4. We also have features three and four with weights 2 each--they lead to node B and activate with 1*2 + 1*2=4. In the next layer, node A has weight 0.4 and node B has weight 0.6. Is feature one more important than both features three and four?

What if there are 7 more layers?

Often, neural networks are used in a setting where features interact so much that the concept of importance is not really clear (e.g., pixel data). There is however a lot of work on interpreting neural networks.

As far as feature importance; if the features truly have distinct importances, it might be worth using a different classifier to see it (e.g., LASSO).

With a neural network, possibly one could shuffle each feature and see what happens to predictive performance? This is one way of doing it for random forests. I have seen some recent papers where the authors, I think, kind of masked features and checked the effect. Another option suggested here is to calculate the gradient with respect to the inputs.

Related Question