If you have a lot of observations and a lot of variables, performance shouldn't suffer much from such variables. But it is better to remove them if possible. The problem is identifying them in some manner that isn't just ad-hoc.
But there are a few algorithms for variable selection that can fine-tune things. This paper ranks and then performs a stepwise addition procedure to get the best subset.
This paper generates p-values for variable importances. Those with high p-values can be trimmed off, which can improve results.
How is this different from utilizing 'later' data in the time series
as testing?
The approach you quote is called "rolling origin" forecasting: the origin from which we forecast out is "rolled forward", and the training data is updated with the newly available information. The simpler approach is "single origin forecasting", where we pick a single origin.
The advantage of rolling origin forecasting is that it simulates a forecasting system over time. In single origin forecasting, we might by chance pick an origin where our system works very well (or very badly), which might give us an incorrect idea of our system's performance.
One disadvantage of rolling origin forecasting is its higher data requirement. If we want to forecast out 10 steps with at least 50 historical observations, then we can do this single-origin with 60 data points overall. But if we want to do 10 overlapping rolling origins, then we need 70 data points.
The other disadvantage is of course its higher complexity.
Needless to say, you should not use "later" data in rolling origin forecasting, either, but only use data prior to the origin you are using in each iteration.
Should I be validating my RF regression model with this
approach as well as on the testing data set?
If you have enough data, a rolling origin evaluation will always inspire more confidence in me than a single origin evaluation, because it will hopefully average out the impact of the origin.
Furthermore, is this sort
of 'autoregressive' approach to random forest regression valid for
time series, and do I even need to create this many lagged variables
if I'm interested in a prediction 10 minutes in the future?
Yes, rolling vs. single origin forecasting is valid for any predictive exercise. It doesn't depend on whether you use random forests or ARIMA or anything else.
Whether you need your lagged variables is something we can't counsel you on. It might be best to talk to a subject matter expert, who might also suggest other inputs. Just try your RF with the lagged inputs vs. without. And also compare to standard benchmarks like ARIMA or ETS or even simpler methods, which can be surprisingly hard to beat.
Best Answer
It works well but only if the features are properly prepared so that the order of the lines is not important anymore.
E.g. for a univariate time series $y_i$, you would use $y_i$ as response and e.g. the following features:
Lagged versions $y_{i-1}$, $y_{i-2}$, $y_{i-3}$ etc.
Differences of appropriate order, e.g. $y_{i-1} - y_{i-2}$, $y_{i-1} - y_{i-8}$ (if there is weekly seasonality expected and the observations occur daily) etc.
Integer or dummy coded periodic time info such as month in year, week day, hour of day, minute in hour etc.
The same approach works for different modelling techniques, including linear regression, neural nets, boosted trees etc.
An example is the following (using a binary target "temperature increase" (y/n)):
Replacing variables "y" and "m" by factors would probably improve the logistic regression. But since the question was about random forests, I leave this to the reader.