The concern with the first approach is that you use both aggregation and interpolation, and aggregation is a known risk in regression because of Ecological Fallacy. Thus, any interpretation that follows is subject to attack - and interpolation adds another degree of uncertainty. An alternative would be to just select the month during which the quarterly data point was drawn from - i.e. if the Q1 data observation was drawn from March, then drop the January and February data observations and keep March. Perform your analysis with four monthly data observations for each year, and then use interpolation to forecast by month.
If you'd rather not simply drop data and/or would rather capture each observation's value somehow, you could attempt a moving average calculation or other smoothing techniques. For more about moving average and smoothing techniques:
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc42.htm
The second option, like you mentioned, is to use interpolation (to obtain monthly data from quarterly data). This is reliable, but you usually need to justify why you use interpolation - and which scheme you wish to use (linear, cosine, cubic, etc.). If the quarterly data observations are simply snapshot measurements at a point in time, interpolation might not be your best bet. Interpolation would best be used if the quarterly measurements are representations of the entire quarter or if you have a reason to capture the difference between two quarters. So as an alternative, you could just repeat the raw values of Q1 for Jan, Feb, Mar, and so on for the other quarters.
You can use MIDAS regressions to predict low-frequency target variable with some set of high-frequency (hf) covariates. If you have a larger set of hf covariates, the simplest way is to use forecast combinations, see [1] of such an application of MIDAS regression model to nowcast GDP growth rate using large set of hf covariates. Also, see Matlab toolbox for MIDAS regressions in many different contexts and actual data examples.
Briefly about MIDAS regression:
Formally, let $t\in [T]$ be the low-frequency (lf) time period, say quarterly and $y_t$ be the lf target variable (so you observe it each quarter). Further, let $x_{t-j/m}$, $j \in \{0, \dots, m-1\}$, be hf covariate which we observe within period $(t-1, t]$, i.e. we observe $x$ $m$ times per $t$ time period, and the notation $j/m$ means that $x$ is lagged by a fraction of time period $t$. Say we have daily hf. Assuming 66 days within the quarter (reasonable assumption e.g when taking trading days but it can be generalized to any number, and it can even vary quarter...), this means that we have 66 $x$ observations for each $y$ observation. To use information efficiently, we may wish to use all 66 observations in our regression model, so we can write the model as
$ \qquad \qquad y_t = \alpha + \sum_{j=0}^{m-1} \beta_{j} x_{t-1-j/m} + \epsilon_t$
where we subtract additional lag in $x$ to make the regression predictive. This is what is called U-MIDAS (unrestricted MIDAS) regression model. When $m$ is large, as in the quarterly/daily example, the model suffers from parameter proliferation problem (a lot of parameters and typically small samples to estimate them). In MIDAS regression we parameterize $\sum_{j=0}^{m-1} \beta_{j}$ lag polynomial such that it depends on a few, say one or two, parameters. In this case, the model is
$ \qquad \qquad y_t = \alpha + \beta \omega (\theta) X_{t-1} + \epsilon_t$
where $\beta$ is the usual regression slope coefficient, $\omega (\theta)$ is some weight function typically taken to be exponential Almon or beta density function, $\theta$ is a low-dimensional parameter of $\omega(.)$ that determines the shape of the weight function and $X_{t-1} \in \mathbf{R}^m$ is a vector of hf lags. Slope coefficient in fact is needed only if you want to know the aggregate effect of $X_{t-1}$ on $y_t$, while if your goal is the prediction you actually do not need it. In the former case, $\omega$ function has to scaled to sum to one for $\beta$ to be identified.
To estimate U-MIDAS you use OLS, while for MIDAS you typically need non-linear LS estimator. There is a very elegant way to avoid NLS estimation (sometimes NLS is problematic) by profiling out $\theta$ parameter, see [2]. In this case, you only need OLS to estimate MIDAS regression.
You can also check wiki on MIDAS models: https://en.wikipedia.org/wiki/Mixed-data_sampling
Hope this helps!
Matlab toolbox:
https://nl.mathworks.com/matlabcentral/fileexchange/45150-midas-matlab-toolbox
R package: https://cran.r-project.org/web/packages/midasr/midasr.pdf
Refs:
[1] Andreou, E., Ghysels, E., & Kourtellos, A. (2013). Should macroeconomic forecasters use daily financial data and how?. Journal of Business & Economic Statistics, 31(2), 240-251.
[2] Ghysels, E., & Qian, H. (2019). Estimating MIDAS regressions via OLS with polynomial parameter profiling. Econometrics and statistics, 9, 1-16.
Best Answer
Yes, you can do that. If you are considering
auto.arima()
, then simply useforecast(..., h=13)
and look at?forecast.Arima
(note the capitalization).You can either forecast your weekly data and aggregate the forecasts, or model and forecast on quarterly granularity. It's hard to say offhand which one will be more accurate. (If you want to be fancy, you can do both and reconcile the forecasts using the MAPA algorithm, Kourentzes et al., 2014 - take a look at the MAPA package for R).
If you have only two quarters' worth of data, then yes, that is not much. But then again, that is just 26 weeks, which I would not consider much more reliable to forecast 13 weeks out.
As above, we can't tell you which approach would be most accurate. Try both aggregating-then-forecasting and forecasting-then-aggregating (and potentially MAPA) and check for yourself which one is better on your particular data, using a holdout sample. I personally believe that MAPA is worthwhile, but I won't declare that it will always improve accuracy.