Solved – Rerunning with only important features doesn’t change model output

boostingfeature selectionpredictive-models

I am trying to predict sales of certain product using regression method. I am using XGboost and using MAPE as final metric for comparison between models. I have around 23 features but there are many categorical variables which i have converted into dummy variables. So now there around 210 features many of which are sparse.

I ran XGBoost model on this and i checked for feature importance using xgb.importance(). It showed the importance value for only 84 features. So i ran one more iteration of XGBoost only with these 84 features which are important but there is no change in model output.

So does presence of other features which is not important has any affect on XGBoost model ? How can i perform feature selection using XGBoost ?

Best Answer

Your result is correct, XGB recognizes that many of your features are not important and didn't use them in the process of building decision trees. You can force XGB to use all of them by increasing max tree depth setting, but you are overfitting the data this way.

Back to your problem, only 84 features are used by XGB and therefore discarding others produces very similar result

Related Solutions

Solved – Being able to detect the important features sklearn.make_classification generates

I have to admit, I initially thought the chi2 and f_classif may be the culprits. I therefore quickly wrote the functions below:

One looking at feature importances calculated by the random forest classifier:

def get_rf_feat_importances(X,Y):

    from sklearn.ensemble import RandomForestClassifier
    rf = RandomForestClassifier()
    rf.fit(X, Y)

    return rf.feature_importances_

And the other plotting the Regularisation Path:

def get_LARS_Lasso_path(X,Y):

    import matplotlib.pyplot as plt
    from sklearn import linear_model
    alphas, _, coefs = linear_model.lars_path(X.values, Y.values, method='lasso', verbose=True)

    xx = np.sum(np.abs(coefs.T), axis=1)
    xx /= xx[-1]

    plt.plot(xx, coefs.T)
    ymin, ymax = plt.ylim()
    plt.vlines(xx, ymin, ymax, linestyle='dashed')
    plt.xlabel('|coef| / max|coef|')
    plt.ylabel('Coefficients')
    plt.title('LASSO Path')
    plt.axis('tight')
    plt.savefig('Lasso_Path.png')

To my surprise, these show similar results. The feature importances generated by the first one and the regularisation path generated by the second sometimes indicate the same number of informative features (especially for 2), but in most cases the informative features they indicate is less than what was provided to the make_classification function.

Answers:

First to question 2) From my two functions above, it seems like the phenomenon is not specific to chi2 or f_classif scores. What these two scores do is already explained well here, so I am not going to repeat.

1) The only thing I can think of here is that all of these methods are looking at individual feature importances of these variables. It is possible that the informative features are correlated within themselves, and accounting for one's impact in improving predictive performance may be rendering the others redundant. This is explained in this comprehensive (albeit slightly dated) review.

In Section 4.2, we introduced nested subset methods that provide a useful ranking of subsets, not of individual variables: some variables may have a low rank because they are redundant and yet be highly relevant.

Best Answer

Related Solutions

Solved – Being able to detect the important features sklearn.make_classification generates

Related Question