Solved – Singular Value Decomposition (SVD) for feature selection

feature selectionmachine learningsvd

I was reading a paper called "Production Optimization Using Machine Learning in Bakken Shale", and came across an approach I was a bit puzzled by. Unfortunately, the paper is behind a paywall, but I will explain the case:

The researchers have a dataset with 14 predictors, representing different geological properties, well-design properties, etc. In order to conduct feature selection, they first run a Singular Value Decomposition (SVD), and state that "eight principal components can explain more than 90% of total input variance":

SVD plot

Further they run a Random Forest (RF) with all 14 features, and rank the features according to their Variable Importance score. Additionally, they perform Recursive Feature Elimination (RFE) which also ranks the features.

After this, they refer to the output from the SVD and state "as eight input parameters would be enough, the features we select for our deep learning model are:", and then they list the 8 features that were ranked highest over the two abovementioned methods (RF and RFE).

My question is: Is this a valid way of utilizing the output from a Singular Value Decomposition? I figured these "eight principal components" would be some kind of transformed versions of the original variables, thus not making it directly valid to apply this insight to the variables in their original form. Please correct me if I'm wrong!

Best Answer

PCA is related to SVD, so your general question is answered in the Using principal component analysis (PCA) for feature selection thread.

In the paper, the authors give following rationale for feature selection:

Feature selection can facilitate data visualization/understanding and improve prediction performance by reducing redundant input dimension (Guyon and Elisseeff 2003). In our study, we used the combination of several methods (listed below) to select the useful features for prediction.

Those may be valid arguments in general, but do not seem to be reasonable for their setting. First of all, going down from 14 to 8 features, for me, does not sound like a great improvement in terms of ease of interpretability and visualization, since 14 is already a small number of features. Second, they've used neural network, that is already a "black box" model that is harder to interpret, same with random forest where it is not feasible to draw all the trees in the forest and interpret them directly. In cases of both neural networks and random forests, there are indirect methods for measuring feature interpretability, that would work the same, and be equally interpretable, no matter if they used 8 or 14 features. Third, neither random forest, nor neural networks have problems with redundant features. Both of the algorithms can deal by itself with "selecting" the appropriate features. I'm not saying that feature selection is for no use with those algorithms, but that is unlikely the case if you have only 14 variables.

Related Question