Solved – How run Random Forest when there is temporal structure in the data

random foresttime series

I am used to data sets that dont have a time component. In
In reading up on time series data i learned the importance of transforming the data into stationary data before applying the ARIMA model to univariate time series. But what if i have a multivariate data set (meaning: many features potentially explaining one response variable) and i would like to use an advanced ML technique such as a regression random forest? Before applying the random forest, should i first transform the data into a stationary time series?

Thanks!

More details with specific questions below:

-for each point in time ( t=0, 1,….T) i have a value for the response variable y and the features x1, x2….xN)
-I need to predict future values of y based on knowing x1, x2,…xN
-note:i cannot use time as one of the features in predicting y
-by the way: if i disregard any temporal structure in the data, and run the random forest on a randomly chosen training data set, and evaluate the error on the remaining test set i get a pretty good result. Note: since i disregarded the temporal order, my test set observations do not necessarily occur after my training set observations.
Questions:
1-Would i maybe get better results if i turned the data into stationary data and then ran the random forest on the transformed data?
2-If so, do i need to apply the same transformation to the y as well as to the x variables? E.g. If to remove a temporal structure in the y i difference the y and remove a seasonal component in the y, do i need to also difference all the x and remove any seasonal component in the x before running random forest?
3-once i have made the series stationary, is it ok to randomly pick a training and test data set without respecting temporal order (ie without having test set observations occur after training set observations)?
e features.

Best Answer

To apply Random Forest you dont need to check for any assumption. Take y=t,and x=t-1, t-2, t-3 (all lags you feel would help).
But instead of applying RF etc, go with Time series techniques like- Hybrid Model in R, which will give you ensemble of ARIMA, ETS, NN, TBATS, THETAM, STLM algorithms.

Another algo that handles multiple level of seasonality ( Facebook's Prophet model)-

https://machinelearningstories.blogspot.in/2017/05/facebooks-phophet-model-for-forecasting.html

Related Question