Solved – How to approach time series regression with monthly dependent variable and quarterly independent variables

regressiontime series

I am building a regression model where my goal is to obtain a monthly forecast of the dependent variable for the next 2 years. I have a monthly historical series available. For my independent variables, I only have quarterly historical data as well as quarterly forecasts for the next 2 years.

My current approach is converting the monthly dependent variable into a quarterly series by taking the simple average of the 3 months in each quarter. Thus my regression uses quarterly series for all variables. The quarterly forecast is converted to monthly by linear interpolation. If it matters, specifically I am using an ARMA model with exogenous regressors (using auto.arima in the R forecast package). I have 13 years of historical data.

My question is- would it be better to instead convert the independent variables from quarterly to monthly? I would do just do a linear interpolation which I think is reasonable behavior for these specific variables. Thus I would now be regressing monthly data on monthly data. The benefit I see is obtaining more data points- I would have about 200 instead of 50. And when using lagged variables, I would lose a smaller percentage of the data. Are there any downsides to this approach or any other considerations I am overlooking?

Best Answer

The concern with the first approach is that you use both aggregation and interpolation, and aggregation is a known risk in regression because of Ecological Fallacy. Thus, any interpretation that follows is subject to attack - and interpolation adds another degree of uncertainty. An alternative would be to just select the month during which the quarterly data point was drawn from - i.e. if the Q1 data observation was drawn from March, then drop the January and February data observations and keep March. Perform your analysis with four monthly data observations for each year, and then use interpolation to forecast by month.

If you'd rather not simply drop data and/or would rather capture each observation's value somehow, you could attempt a moving average calculation or other smoothing techniques. For more about moving average and smoothing techniques: http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc42.htm

The second option, like you mentioned, is to use interpolation (to obtain monthly data from quarterly data). This is reliable, but you usually need to justify why you use interpolation - and which scheme you wish to use (linear, cosine, cubic, etc.). If the quarterly data observations are simply snapshot measurements at a point in time, interpolation might not be your best bet. Interpolation would best be used if the quarterly measurements are representations of the entire quarter or if you have a reason to capture the difference between two quarters. So as an alternative, you could just repeat the raw values of Q1 for Jan, Feb, Mar, and so on for the other quarters.

Related Solutions

Solved – Time series cross section forecasting with R

After a bit of research, I can give a partial answer. In his book Wooldridge discusses Poisson and negative binomial regressions for cross-section and panel data. But for regression with lagged variables he only discusses Poisson regression. Maybe negative binomial is discussed in the new edition. The main conclusion is that for random effects Poisson regression with lagged random variable can be estimated by mixed effects Poisson regression model. The detailed description can be found here. The mixed effects Poisson regression in R can be estimated with glmer from package lme4. To adapt it to work with panel data, you will need to create lagged variable explicitly. Then your estimation command should look something like this:

glmer(y~lagy+exo+(1|Country),data,family=quasipoisson)

You should also look into gplm package suggested by @dickoa. But be sure to check, whether it supports lagged variables. Yves Croissant, the creator of gplm and plm packages writes wonderful code, but unfortunately in my personal experience, the code is not tested enough, so bugs crop up more frequently than in standard R packages.

Solved – Arima time series forecast (auto.arima) with multiple exogeneous variables in R

If your external regressors are causal for $y$, but not the other way around and do not cause each other, then ARIMA is definitely appropriate. VAR makes sense if your different time series all depend on each other.

For auto.arima() to work with external regressors, collect your regressors into a matrix X, which you feed into the xreg parameter of auto.arima(). (Of course, X must have the same number of rows as the time series y you are modeling.)

For forecasting, you will need the future values of your regressors, which you then again feed into the xreg parameter of forecast.

The help pages are ?auto.arima and ?forecast.Arima (note the capital A - this is not a typo. Don't ask me...).

Best Answer

Related Solutions

Solved – Time series cross section forecasting with R

Solved – Arima time series forecast (auto.arima) with multiple exogeneous variables in R

Related Question