Solved – the relationship between bagging and XGBoost or Logistic Regression

baggingboostinglogisticmachine learningregression

I am practicing classification with machine learning on a very large set of samples (about 20,000), where half of which are labelled training data and the other half are the testing data. There are 13 categories of features with a classification of 9 different families. I have read up on different models to use to get the best accuracy of prediction, but I'm stuck at this point and don't know which order to do things.

I have my feature matrix of both the training and testing data and would like to start training my model. From what I understand, bagging is very popular in problems like this. And I would like to test out a Logistic Regression model as well as XGBoost.

Do I use bagging before testing my models or do I train my models and then use bagging to reduce the variance? What is the relationship between the bagging and the two models?

Best Answer

Bagging works as follows: You resample your training data $m$ times, create $m$ estimators, fit each estimator to a resampled training set, and then average their predictions over new data.

Logistic regression is not a method which uses bagging, but you can bag a bunch of logistic regressions.

The "GB" in XGBoost stands for "Gradient Boosting" which is a different technique.