Neural Network Feature Importance – Exploring the Weight of Neural Networks and Feature Importance

deep learningfeature selectionneural networks

I have constructed a neural network with training dataset.
The classification performance was quite good, but I wonder wehter there is a way to find that which inputs were more relevant in classifying test samples.

As the neural network is a weighted graph, I think calculating the whole weights of each input mean feauture importance, but is it proper?

Suppose it contains 2 hidden layers(h1, h2) and both hidden layers contain 2 nodes(n11, n12, n21, n22). The number of input variables is two(X, Y).

For example, for two inputs X and Y, suppose the weight of each edge is like below.

weight of X to n11 = 0.1 
weight of X to n12 = 0.2
weight of Y to n11 = 0.2
weight of Y to n12 = 0.3
weight of n11 to n21 = 0.4
weight of n12 to n21 = 0.6

I think the weight can be simply calculated.

weight of X = 0.1*(0.4+0.6) + 0.2*(0.4+0.6) = 0.3
weight of Y = 0.2*(0.4+0.6) + 0.3*(0.4+0.6) = 0.5

In this case, can I say that input Y is better feature than input X for classification with this neural network?

Best Answer

I am no expert, but I think you must calculate the gradient of the loss function wrt the inputs. This would tell you the relative importance of each input for classifying a specific training example. To get a measure of importance of an input for the whole training set you could take the average of these gradients across all examples in the training set. Or it might be better to take the MAX of the gradients (a low value would mean an input never had much effect).