I have time series data for two and half year. I am performing xgb.cv (from Jan 2014 to March 2016) and its giving very good result, but when I am using new data (April 2016 to June 2016) to predict, the accuracy is getting worse. I am using R xgboost package. Any suggestion on improving accuracy?
Solved – Time Series and XGBoost
boostingtime series
Best Answer
It is likely that your features have time biases. Assume you want to predict month and year from your data. You would see very good performance during CV as you have enough data points at each month. However you can not predict the future(April 2016 to June 2016) very well using your predictor. Another example is that assume you have a feature that its density changes during the time. Then the difference between cv error and your prediction in the future would be big. To solve it, you can use following suggestions: