Solved – Feature selection using deep learning

deep learningdeep-belief-networksfeature selectionrestricted-boltzmann-machine

I want to calculate the importance of each input feature using deep model.

But I found only one paper about feature selection using deep learning – deep feature selection. They insert a layer of nodes connected to each feature directly, before the first hidden layer.

I heard that deep belief network (DBN) can be also used for this kind of work. But I think, DBN provides only abstractions (clusters) of features like PCA, so though it can reduce the dimension effectively, I wonder that if it is possible to calculate the importance (weight) of each feature.

Is it possible to calcualte the feature importance with DBN?
And are there other known methods for feature selection using deep learning?

Best Answer

One approach you can take for almost any prediction model is to first train your model and find its accuracy, then for one input add some noise to it and check the accuracy again. Repeat this for each input and observe how the noise worsens the predictions. If an input is important then the extra uncertainty due to the noise will be detrimental.

Remember set the variance of the noise to be proportional to the variance of the input in question.

Of course noise is random and you don't want one input to appear unimportant due to random effects. If you have few training examples then consider repeatedly calculating the change in accuracy for each training example with a new noise added each time.

In response to the comments:

This analysis can also be done by removing a variable entirely but this has some downsides compared to adding noise.

  • Suppose that one of your inputs is constant, it acts like a bias term so it has some role to play in the prediction but it adds no information. If you removed this input entirely then the prediction would become less accurate because the perceptrons are getting the wrong bias. This makes the input look like it is important for prediction even though it adds no information. Adding noise won't cause this problem. This first point isn't a problem if you have standardized all inputs to have zero mean.

  • If two inputs are correlated then the the information about one input gives information about the other. A model could be trained well if you used only one of the correlated inputs so you want the analysis to find that one input isn't helpful. If you just removed one of the inputs then, like the first point made, the prediction accuracy would decrease a lot which indicates that it is important. However, adding noise won't cause this problem.

Related Question