How is this different from utilizing 'later' data in the time series
as testing?
The approach you quote is called "rolling origin" forecasting: the origin from which we forecast out is "rolled forward", and the training data is updated with the newly available information. The simpler approach is "single origin forecasting", where we pick a single origin.
The advantage of rolling origin forecasting is that it simulates a forecasting system over time. In single origin forecasting, we might by chance pick an origin where our system works very well (or very badly), which might give us an incorrect idea of our system's performance.
One disadvantage of rolling origin forecasting is its higher data requirement. If we want to forecast out 10 steps with at least 50 historical observations, then we can do this single-origin with 60 data points overall. But if we want to do 10 overlapping rolling origins, then we need 70 data points.
The other disadvantage is of course its higher complexity.
Needless to say, you should not use "later" data in rolling origin forecasting, either, but only use data prior to the origin you are using in each iteration.
Should I be validating my RF regression model with this
approach as well as on the testing data set?
If you have enough data, a rolling origin evaluation will always inspire more confidence in me than a single origin evaluation, because it will hopefully average out the impact of the origin.
Furthermore, is this sort
of 'autoregressive' approach to random forest regression valid for
time series, and do I even need to create this many lagged variables
if I'm interested in a prediction 10 minutes in the future?
Yes, rolling vs. single origin forecasting is valid for any predictive exercise. It doesn't depend on whether you use random forests or ARIMA or anything else.
Whether you need your lagged variables is something we can't counsel you on. It might be best to talk to a subject matter expert, who might also suggest other inputs. Just try your RF with the lagged inputs vs. without. And also compare to standard benchmarks like ARIMA or ETS or even simpler methods, which can be surprisingly hard to beat.
Best Answer
RFs, of course, can identify and model a long-term trend in the data. However, the issue becomes more complicated when you are trying to forecast out to never seen before values, as you often are trying to do with time-series data. For example, if see that activity increases linearly over a period between 1915 and 2015, you would expect it to continue to do so in the future. RF, however, would not make that forecast. It would forecast all future variables to have the same activity as 2015.
The above script will print 2013, 2014, 2015, 2015, 2015, 2015. Adding lag variables into the RF does not help in this regard. So careful. I'm not sure if adding trend data to your RF is gonna do what you think it will.