Solved – Averaging predictions from two different models

boostinggeographymachine learningrandom forest

First, a short introduction. I am predicting latitude/longitude using Random Forests and XGBoost based on several environmental variables and custom features such as cluster IDs (there are obvious spatial clusters in data). Based on validation set, predictions from RF are slightly better than XGBoost (I am minimising RMSE).

I came up with an idea of taking a (weighted) average of these raw predictions and calculating RMSE based on these. My reasoning behind this is that if we assign a higher weight to a better model (in this case RF) and a lower one to weaker (XGBoost), the predictions could improve. Surprisingly, RMSE is indeed lower when I combine the predictions.

I also attached 2D function plots for both latitude and longitude. Clearly the functions look almost identical. What's interesting is that the results follow the intuition: assigning bigger weight to the better model (RF) results in better RMSE compared to assigning a bigger weight to the weaker model (XGBoost). This can be seen in plots below.

The best weights were found to be as follows:

Best weights

Now, my question is – is that a valid approach from a statistical/machine learning point of view? If not, are there any approaches of combining model results (apart from model ensembles) which I could try?

RMSE for latitude
RMSE for longitude

Code for analysis:

weight_rf <- seq(0, 1, by = 0.01)
weight_xgb <- seq(0, 1, by = 0.01)
len_rf <- length(weight_rf)
len_xgb <- length(weight_xgb)
lat_grid <- matrix(nrow = len_rf, ncol = len_xgb)
lon_grid <- lat_grid

pb <- txtProgressBar(min = 0, max = len_rf * len_xgb, style = 3)

iter = 1

for(i in 1:len_rf) {
      for(j in 1:len_xgb) {
            mean.lat.pred <- (weight_rf[i] * rf.lat.pred + weight_xgb[j] * xgb.lat.pred)/(weight_rf[i] + weight_xgb[j])
            mean.lon.pred <- (weight_rf[i] * rf.lon.pred + weight_xgb[j] * xgb.lon.pred)/(weight_rf[i] + weight_xgb[j])
            lat_grid[i, j] <- return_metrics(mean.lat.pred, mean.lon.pred, tz$lat_deg[-train], tz$lon_deg[-train])$RMSE_lat
            lon_grid[i, j] <- return_metrics(mean.lat.pred, mean.lon.pred, tz$lat_deg[-train], tz$lon_deg[-train])$RMSE_lon
            iter = iter + 1
            setTxtProgressBar(pb, iter)
      }
}

image(lat_grid, xlab = "RF weight", ylab = "XGBoost weight", main = "RMSE for Latitude", col = color_ramp, axes = F)
axis(1, at = seq(0, 1, length.out = len_rf), labels = weight_rf)
axis(2, at = seq(0, 1, length.out = len_xgb), labels = weight_xgb)
image(lon_grid, xlab = "RF weight", ylab = "XGBoost weight", main = "RMSE for Longitude", col = color_ramp, axes = F)
axis(1, at = seq(0, 1, length.out = len_rf), labels = weight_rf)
axis(2, at = seq(0, 1, length.out = len_xgb), labels = weight_xgb)

lat_best <- arrayInd(which.min(lat_grid), dim(lat_grid))
lon_best <- arrayInd(which.min(lon_grid), dim(lon_grid))

lat_rf_best <- weight_rf[lat_best[1]]
lat_xgboost_best <- weight_rf[lat_best[2]]

lon_rf_best <- weight_rf[lon_best[1]]
lon_xgboost_best <- weight_rf[lon_best[2]]

best_df <- data.frame(rf = c(lat_rf_best, lon_rf_best), xgboost = c(lat_xgboost_best, lon_xgboost_best))
rownames(best_df) <- c("lat", "lon")
best_df

Best Answer

It is a valid approach: this is a model ensemble, averaging being the basic ensemble method. If the base model prediction results are not highly correlated, and their prediction is not very bad, then you can combine the base models' predictions, combining the benefit of each model; the final result is always better than each base model's.

Why is averaging so effective? I will cite the word from http://mlwave.com/kaggle-ensembling-guide/:

One may be mystified as to why averaging helps so much, but there is a simple reason for the effectiveness of averaging. Suppose that two classifiers have an error rate of 70%. Then, when they agree they are right. But when they disagree, one of them is often right, so now the average prediction will place much more weight on the correct answer.

Related Question