Solved – Forecast time series data with external variables

forecastinglagsmultiple regressiontime series

Currently I'm working on a project to do forecasting of a time series data (monthly data). I am using R to do the forecasting.
I have 1 dependent variable (y) and 3 independent variables (x1, x2, x3). The y variable has 73 observations, and so does the other 3 variables (alos 73). From January 2009 to January 2015.
I have checked correlations and p-value, and it's all significant to put it in a model.
My question is: How can I make a good prediction using all the independent variables? I don't have future values for these variables.
Let's say that I would want to predict what my y variable in over 2 years (in 2017). How can I do this?

I tried the following code:

    model = arima(y, order(0,2,0), xreg = externaldata) 

Can I do a prediction of the y value over 2 years with this code?

I also tried a regression code:

    reg = lm(y ~ x1 + x2 + x3) 

But how do I take the time in this code? How can I forecast what my y value will be over lets say 2 years? I am new to statistics and forecasting. I have done some reading and cam across the lag value, but how can I use a lag value in the model to do forecasting?

Actually my overall question is how can I forecast a time series data with external variables with no future value?

Best Answer

If you fit a model using external variables and want to forecast from this model, you will need (forecasted) future values of the external variables, plain and simple. There is no way around this.

There are of course different ways of forecasting your explanatory variables. You can use the last observed value (the "naive random walk" forecast) or the overall mean. You can simply set them to zero if this is a useful value for them (e.g., special events that happened in the past like an earthquake, which you don't anticipate to recur). Or you could fit and forecast a time series model to these explanatory variables themselves, e.g., using auto.arima.

The alternative is to fit a model to your $y$ values without explanatory variables, by removing the xreg parameter, then to forecast $y$ using this model. One advantage is that this may even capture regularities in your explanatory variables. For instance, your ice cream sales may be driven by temperature, and you don't have good forecasts for temperature a few months ahead... but temperature is seasonal, so simply fitting a model without temperature yields a seasonal model, and your seasonal forecasts may actually be pretty good even if you don't include the actual driver of sales.

I recommend this free online forecasting textbook, especially this section on multiple regression (unfortunately, there is nothing about ARIMAX there), as well as Rob Hyndman's blog post "The ARIMAX model muddle".