Solved – Time Series and XGBoost

boostingtime series

I have time series data for two and half year. I am performing xgb.cv (from Jan 2014 to March 2016) and its giving very good result, but when I am using new data (April 2016 to June 2016) to predict, the accuracy is getting worse. I am using R xgboost package. Any suggestion on improving accuracy?

Best Answer

It is likely that your features have time biases. Assume you want to predict month and year from your data. You would see very good performance during CV as you have enough data points at each month. However you can not predict the future(April 2016 to June 2016) very well using your predictor. Another example is that assume you have a feature that its density changes during the time. Then the difference between cv error and your prediction in the future would be big. To solve it, you can use following suggestions:

  • Drop time biased features: time related indices, features that has a big shift during time, etc
  • Create features that are partially independent of time: from date, get "day of week", "weekend" and so on