Solved – How to approach time series regression with monthly dependent variable and quarterly independent variables

regressiontime series

I am building a regression model where my goal is to obtain a monthly forecast of the dependent variable for the next 2 years. I have a monthly historical series available. For my independent variables, I only have quarterly historical data as well as quarterly forecasts for the next 2 years.

My current approach is converting the monthly dependent variable into a quarterly series by taking the simple average of the 3 months in each quarter. Thus my regression uses quarterly series for all variables. The quarterly forecast is converted to monthly by linear interpolation. If it matters, specifically I am using an ARMA model with exogenous regressors (using auto.arima in the R forecast package). I have 13 years of historical data.

My question is- would it be better to instead convert the independent variables from quarterly to monthly? I would do just do a linear interpolation which I think is reasonable behavior for these specific variables. Thus I would now be regressing monthly data on monthly data. The benefit I see is obtaining more data points- I would have about 200 instead of 50. And when using lagged variables, I would lose a smaller percentage of the data. Are there any downsides to this approach or any other considerations I am overlooking?

Best Answer

The concern with the first approach is that you use both aggregation and interpolation, and aggregation is a known risk in regression because of Ecological Fallacy. Thus, any interpretation that follows is subject to attack - and interpolation adds another degree of uncertainty. An alternative would be to just select the month during which the quarterly data point was drawn from - i.e. if the Q1 data observation was drawn from March, then drop the January and February data observations and keep March. Perform your analysis with four monthly data observations for each year, and then use interpolation to forecast by month.

If you'd rather not simply drop data and/or would rather capture each observation's value somehow, you could attempt a moving average calculation or other smoothing techniques. For more about moving average and smoothing techniques: http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc42.htm

The second option, like you mentioned, is to use interpolation (to obtain monthly data from quarterly data). This is reliable, but you usually need to justify why you use interpolation - and which scheme you wish to use (linear, cosine, cubic, etc.). If the quarterly data observations are simply snapshot measurements at a point in time, interpolation might not be your best bet. Interpolation would best be used if the quarterly measurements are representations of the entire quarter or if you have a reason to capture the difference between two quarters. So as an alternative, you could just repeat the raw values of Q1 for Jan, Feb, Mar, and so on for the other quarters.