Time Series – Understanding Whether to Use Univariate or Multivariate Time Series Analysis

multivariate analysistime seriesunivariate

I believe univariate time series pertain to one single variable changing over time and multivariate refer to multiple variables (either dependant or independent), however the following case is unclear to me, as there are two independent variables but one does not change over time, it's more of a category (a location in this case).

Is this still classed as a multivariate time series?

  Date     Place  Value
01/01/2021   A      1
02/01/2021   A      2
02/01/2021   B      1
03/01/2021   B      3
04/01/2021   C      2

Best Answer

Its panel data on a daily base. I inserted a link to this original image. Quality is a bit bad. Sorry for this! The explanation for cross sectional is a little bit uninformative as it leaves out the part with (t, time): I added another graphic.

crosshttps://image.slidesharecdn.com/timeseriesforecastingaidayminsk11-171016115720/95/time-series-forecasting-3-638.jpg?cb=1508155473

enter image description here

Update

You would approach your data as a multivariate case, where i reflect your individual places:

$y(i,t)=a+b * x(i,t)+ ε(i,t)$

This link here, especially table 3 and the following eq., shows you how you would make a regression equation out of your model with panel data, where the i in table 3 refers your city. Thus, you are epxloring the changes of value in the different places over time, as the source state: it is more like a movie than a snapshot like cross sectional data.

If you then now have dealt with the panel data struture of your model, you have to decide if you have a fixed or random effects model, because the ε(i,t) can vary in different ways, see here:

https://stats.stackexchange.com/questions/4700/what-is-the-difference-between-fixed-effect-random-effect-and-mixed-effect-mode

also:

https://www.meta-analysis.com/downloads/Meta-analysis%20Fixed-effect%20vs%20Random-effects%20models.pdf

A multivariate time series, in contrast, is a VAR/VECM where all variables are considered dependent variables. That is a fundamental difference. One could ignore the different time points in your data and rebuild data to measure a VAR, this would be for example a multivariate time series.

For example imagine Place A, B or C in your data are different geographical places with rising tax-indizes (lets believe t where not days but years). In a VAR you can see how place B would increase its tax after A and C after B. This is a possible granger causality in time series, and one main concept of a VAR, a multivariate time series. And because all variables are dependent from each other, this is a difference to panel data.

    Date    Place_A Place B Place C  
01/01/2021    1        0       0 
02/01/2021    2        1       0  
03/01/2021    0        3       0  
04/01/2021    0        0       2

Update 2

One last comment on the deal with data points that are dependent of time: If your data is time dependant like e.g. in my example you would do 'multi-step forecasting of several steps' like the author here states. the 80% 20% rule only holds tue, if you dont shuffle your data. If you want to shuffle data, then you have to insert dummies for lags and time points, and may also use Gradient Boosting methods for forecasting as you can do normally in ML. https://towardsdatascience.com/ml-time-series-forecasting-the-right-way-cbf3678845ff

Related Question